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Abstract.  It  has  been  shown  in  [14]  that  the  accelerated  prox-level  (APL)  method  and  its  variant,  the  uniform  smoothing  level 
(USL)  method,  have  optimal  iteration  complexity  for  solving  black-box  and  structured  convex  programming  problems  without 
requiring  the  input  of  any  smoothness  information.  However,  these  algorithms  require  the  assumption  on  the  boundedness  of 
the  feasible  set  and  their  efficiency  relies  on  the  solutions  of  two  involved  subproblems.  These  hindered  the  applicability  of  these 
algorithms  in  solving  large-scale  and  unconstrained  optimization  problems.  In  this  paper,  we  first  present  a  generic  algorithmic 
framework  to  extend  these  uniformly  optimal  level  methods  for  solving  unconstrained  problems.  Moreover,  we  introduce  two  new 
variants  of  level  methods,  i.e.,  the  fast  APL  (FAPL)  method  and  the  fast  USL  (FUSL)  method,  for  solving  large  scale  black-box 
and  structured  convex  programming  problems  respectively.  Both  FAPL  and  FUSL  enjoy  the  same  optimal  iteration  complexity 
as  APL  and  USL,  while  the  number  of  subproblems  in  each  iteration  is  reduced  from  two  to  one.  Moreover,  we  present  an  exact 
method  to  solve  the  only  subproblem  for  these  algorithms.  As  a  result,  the  proposed  FAPL  and  FUSL  methods  have  improved 
the  performance  of  the  APL  and  USL  in  practice  significantly  in  terms  of  both  computational  time  and  solution  quality.  Our 
numerical  results  on  solving  some  large-scale  least  square  problems  and  total  variation  based  image  reconstruction  have  shown 
great  advantages  of  these  new  bundle-level  type  methods  over  APL,  USL,  and  some  other  state-of-the-art  first-order  methods. 
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1.  Introduction.  Given  a  convex  function  /  :  — )■  M,  the  main  problem  of  interest  in  this  paper  is: 

/*  :=  mm  f{x).  (1.1) 

Throughout  this  paper,  we  assume  that  the  solution  set  X*  of  (1.1)  is  nonempty.  Moreover,  denoting 

a;*  :=  argmin,j,{||s  —  a;||  :  a;  S  V*}  and  :=  ||x  —  a:*||  (1.2) 

for  a  given  initial  point  x  G  K",  we  assume  in  the  black-box  setting  that  for  all  closed  sets 

n  C  B{x,  AD*)  :=  {x  e  K”  :  ||x  -  x||  <  AD*}  ,  (1.3) 

there  exists  M{D)  >  0  and  p{D,)  G  [0, 1],  such  that^ 

f{y)-f{x)-{f{x),y-x)<^^^^^\\y-x\\'^+P^^\  Vx,yen.  (1.4) 

Here  ||  •  ||  denotes  the  Euclidean  norm,  and  /'(x)  G  df{x),  where  df{x)  denotes  the  subdifferential  of  /  at 
X  G  n.  Clearly,  the  above  assumption  on  /(•)  covers  non-smooth  (p(H)  =  0),  smooth  (p(H)  =  1)  and  weakly 
smooth  (0  <  p(fl)  <  1)  functions. 

Our  main  goal  in  this  paper  is  to  develop  a  bundle-level  method  that  is  able  to  compute  an  approximate 
solution  to  problem  (1.1),  with  uniformly  optimal  iteration  complexity  for  non-smooth,  smooth  and  weakly 
smooth  objective  functions  (see  [14]).  Here,  the  iteration  complexity  is  described  by  the  number  of  evaluations 
of  subgradients/gradients  of  /.  In  addition,  we  aim  to  design  the  bundle- level  method  so  that  its  computational 
cost  in  each  iteration  is  small,  in  order  to  improve  its  practical  performance.  Let  us  start  by  reviewing  a  few 
existing  bundle-level  methods. 
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1.1.  Cutting  plane,  bundle  and  bundle-level  methods.  The  bundle-level  method  originated  from 
the  well-known  Kelley’s  cutting-plane  method  in  1960  [8].  Consider  the  convex  programming  (CP)  problem 
of 


min  f{x), 

x^X 

(1.5) 

where  A  is  a  compact  convex  set  and  /  is  a  closed  convex  function.  The  fundamental  idea  of  the  cutting  plane 
method  is  to  generate  a  sequence  of  piecewise  linear  functions  to  approximate  /  in  X .  In  particular,  given 
Xi,X2, .  ■ .  ,Xk  G  X,  we  approximate  /  by 

mk{x)  :=  m.SK{h{xi,  x),  1  <  i  <  k}, 

(1.6) 

and  compute 

the  iterates  Xk+i  by: 

Xk+i  G  Argmin,jg^TOfc(a:), 

(1.7) 

where 

h{z,  x)  :=  f{z)  +  {fiz),x  -  z) . 

(1.8) 

Clearly,  the  functions  rrii,  i  =  1,2,...,  satisfy  mi{x)  <  mi+i{x)  <  f(x)  for  any  x  €  X,  and  are  identical 
to  /  at  those  search  points  Xi,  i  =  1, . . . ,  k.  However,  the  inaccuracy  and  instability  of  the  piecewise  linear 
approximation  rrik  over  the  whole  feasible  set  X  may  affect  the  selection  of  new  iterates,  and  the  above 
scheme  converges  slowly  both  theoretically  and  practically  [18,  20].  Some  important  improvements  of  Kelley’s 
method  have  been  made  in  1990s  under  the  name  of  bundle  methods  (see,  e.g.,  [9,  10,  16]).  In  particular,  by 
incorporating  the  level  sets  into  Kelley’s  method,  Lemarechal,  Nemirovskii  and  Nesterov  [16]  proposed  in  1995 
the  classic  bundle-level  (BL)  method  by  performing  a  series  of  projections  over  the  approximate  level  sets. 

Given  xi,X2,  ■  ■  ■  ,Xk,  the  basic  BL  iteration  consists  of  the  following  three  steps: 

a)  Set  f).  :=  min{/(a::i),  1  <  i  <  k}  and  compute  a  lower  bound  on  /*  by  =  mina;gx  'mk(x). 

b)  Set  the  level  Ik  =  f3£k  +  (1  ~  P)fk  some  (3  S  (0, 1). 

c)  Set  Xk  :=  {x  G  X  :  mk(x)  <  Ik}  and  determine  the  new  iterate 


Xfe+i  =  argmin„,gxj|x  -  Xfcll^.  (1.9) 

In  the  BL  method,  the  localizer  Xk  is  used  to  approximate  the  level  set  Lk  '■=  {x  :  f{x)  <  Ik},  because  the 
projection  over  Lk  is  often  too  difficult  to  compute.  Intuitively,  as  k  increases,  the  value  of  lk  will  converge 
to  /*,  and  consequently  both  Lk  and  Xk  will  converge  to  the  set  of  optimal  solutions  for  problem  (1.5).  It 
is  shown  in  [16]  the  number  of  BL  iterations  required  to  find  an  e-solution  of  (1.5),  i.e.,  a  point  x  G  X  s.t. 
f(x)  —  /*  <  e,  can  be  bounded  by  0{l/e^),  which  is  optimal  for  general  nonsmooth  convex  optimization. 

Observe  that  for  the  above  BL  methods,  the  localizer  Xk  accumulates  constraints,  and  hence  that  the 
subproblem  in  step  c)  becomes  more  and  more  expensive  to  solve.  In  order  to  overcome  this  difficulty,  some 
restricted  memory  BL  algorithms  have  been  developed  in  [10,  3].  In  particular,  Ben-Tal  and  Nemirovski  [3] 
introduced  the  non-Euclidean  restricted  memory  level  (NERML)  method,  in  which  the  number  of  extra  linear 
constraints  in  Xk  can  be  as  small  as  1  or  2,  without  affecting  the  optimal  iteration  complexity.  Moreover,  the 
objective  function  ||  •  ||^  in  (1.9)  is  replaced  by  a  general  Bregman  distance  d{-)  for  exploiting  the  geometry  of  the 
feasible  set  X.  NERML  is  often  regarded  as  state-of-the-art  for  large-scale  nonsmooth  convex  optimization  as 
it  substantially  outperforms  subgradient  type  methods  in  practice.  Some  more  recent  development  of  inexact 
proximal  bundle  methods  and  BL  methods  could  be  found  in  [27,  23,  22,  26,  12,  11,  7,  13]. 

1.2.  Accelerated  bundle- level  type  methods.  While  the  classic  BL  method  was  optimal  for  solving 
nonsmooth  CP  problems  only,  Lan  [14]  recently  significantly  generalized  this  method  so  that  they  can  opti¬ 
mally  solve  any  black-box  CP  problems,  including  non-smooth,  smooth  and  weakly  smooth  CP  problems.  In 
particular,  for  problem  (1.5)  with  compact  feasible  set  X,  the  two  new  BL  methods  proposed  in  [14],  i.e.,  the 
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accelerated  bundle-level(ABL)  and  accelerated  prox-level(APL)  methods,  can  solve  these  problems  optimally 
without  requiring  any  information  on  problem  parameters.  The  ABL  method  can  be  viewed  as  an  accelerated 
version  of  the  classic  BL  method.  Same  as  the  classic  BL  method,  the  lower  bound  on  f*  is  estimated  from  the 
cutting  plane  model  ruk  in  (1.6),  the  upper  bound  on  f*  is  given  by  the  best  objective  value  found  so  far,  and 
the  prox-center  is  updated  by  (1.9).  The  novelty  of  the  ABL  method  exists  in  that  three  different  sequences, 
i.e,  {xk}  and  are  used  for  updating  lower  bound,  prox-center,  and  upper  bound  respectively,  which 

leads  to  its  accelerated  iteration  complexity  for  smooth  and  weakly  smooth  problems.  The  APL  method 
is  a  more  practical,  restricted  memory  version  of  the  ABL  method,  which  can  also  employ  non-Euclidean 
prox- functions  to  explore  the  geometry  of  the  feasible  set  X. 

We  now  provide  a  brief  description  of  the  APL  method.  This  method  consists  of  multiple  phases,  and 
each  phase  calls  a  gap  reduction  procedure  to  reduce  the  gap  between  the  lower  and  upper  bounds  on  /*  by 
a  constant  factor.  More  specifically,  at  phase  s,  given  the  initial  lower  bound  Ibg  and  upper  bound  ub^,  the 
APL  gap  reduction  procedure  sets  /g  =  lbs,/o  =  ubs,Z  =  /3  •  Ib^  -|-  (1  —  /3)ubs,  and  iteratively  performs  the 
following  steps. 

a)  Set  =  {1  —  ak)x'^_^  -I-  akXk-i  and  j_k  :=  max{/j,_2,  min{/, /ife}},  where 

hk-=  min  h{xi,x).  (1.10) 

xGXk_i 

b)  Update  the  prox-center  Xk  by 

Xk  =  argmin,^g^^d(a;),  (1.11) 

where  :=  {a;  S  Xk-i  ■  h{x^f,,x)  <  1}. 

c)  Set  fk  =  min{/j,_]^,  /(ifc)},  where  =  akXk  -I-  (1  —  ak)x'{._^,  and  is  chosen  as  either  or  x'^_^ 
such  that  /(x^)  =  f/.. 

d)  Choose  Xk  such  that  Xk  Q  Q  Xk,  where  Xk  :=  {x  £  AT  :  {Xd{xk),x  —  Xk)  >  0}. 

Here  Vd(-)  denotes  the  gradient  of  d(-).  Observe  that  the  parameters  ak  and  (3  are  fixed  a  priori,  and  do  not 
depend  on  any  problem  parameters.  Moreover,  the  localizer  Xk  is  chosen  between  and  Xk,  so  that  the 
numbers  of  linear  constraints  in  the  two  subproblems  (1.10)  and  (1.11)  can  be  fully  controlled.  It  is  shown 
in  [14]  that  for  problem  (1.5)  with  compact  feasible  set  X,  the  APL  method  achieves  the  optimal  iteration 
complexity  for  any  smooth,  weakly  smooth  and  non-smooth  convex  functions.  Moreover,  by  incorporating 
Nesterov’s  smoothing  technique  [21]  into  the  APL  method,  Lan  also  presented  in  [14]  the  uniform  smoothing 
level  (USL)  method  which  can  achieve  the  optimal  complexity  for  solving  an  important  class  of  nonsmooth 
structured  saddle  point  (SP)  problems  without  requiring  the  input  of  any  problem  parameters  (see  Subsection 
3.2  for  more  details). 

1.3.  Contribution  of  this  paper.  One  crucial  problem  associated  with  most  existing  BL  type  methods, 
including  APL  and  USL,  is  that  each  phase  of  these  algorithms  involves  solving  two  optimization  problems; 
first  a  linear  programing  problem  to  compute  the  lower  bound,  and  then  a  constrained  quadratic  programing 
problem  to  update  the  prox-center  or  new  iterate.  In  fact,  the  efficiency  of  these  algorithms  relies  on  the 
solutions  of  the  two  involved  subproblems  (1.10)  and  (1.11),  and  the  latter  one  is  often  more  complicated  than 
the  projection  subproblem  in  the  gradient  projection  type  methods.  Moreover,  most  existing  BL  type  methods 
require  the  assumption  that  the  feasible  set  is  bounded  due  to  the  following  two  reasons.  Firstly,  the  feasible 
set  has  to  be  bounded  to  compute  a  meaningful  lower  bound  by  solving  the  aforementioned  linear  programing 
problem.  Secondly,  the  convergence  analysis  of  limited  memory  BL  type  methods  (e.g.,  NERML,  APL,  and 
USL)  relies  on  the  assumption  that  the  feasible  set  is  compact.  These  issues  have  significantly  hindered  the 
applicability  of  existing  BL  type  methods.  Our  contribution  in  this  paper  mainly  consists  of  the  following 
three  aspects. 

Firstly,  we  propose  a  novel  bundle-level  framework  for  unconstrained  CP  problems.  The  proposed  frame¬ 
work  solves  unconstrained  CP  problems  through  solutions  to  a  series  of  ball-constrained  CPs.  In  particular, 
if  there  exists  a  uniformly  optimal  method  (e.g.,  the  APL  and  USL  methods)  that  solves  ball-constrained 
black-box  or  structured  CPs,  then  the  proposed  algorithm  solves  unconstrained  black-box  or  structured  CPs 
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with  optimal  complexity  without  requiring  the  input  of  any  problem  parameters  as  well.  To  the  best  of  our 
knowledge,  this  is  the  first  time  in  the  literature  that  the  complexity  analysis  has  been  performed  for  BL  type 
methods  to  solve  unconstrained  CP  problems  (see  Sections  3.2  and  3.3  in  [24]  for  more  details). 

Secondly,  in  order  to  solve  ball-constrained  CPs,  we  propose  two  greatly  simplified  BL  type  methods, 
namely  the  FAPL  and  FUSL  methods,  which  achieve  the  same  optimal  iteration  complexity  as  the  APL 
and  USL  methods  respectively,  and  maintain  all  the  nice  features  of  those  methods,  but  has  greatly  reduced 
computational  cost  per  iteration.  Such  improvement  has  been  obtained  by  the  reduction  and  simplification  of 
the  subproblems  that  have  to  be  solved  in  the  APL  and  USL  methods.  More  specifically,  we  show  that  the 
linear  optimization  subproblem  for  computing  the  lower  bound  can  be  eliminated  and  that  the  ball  constraint 
can  be  removed  from  the  quadratic  subproblem  by  properly  choosing  the  prox-functions.  We  also  generalize 
both  FAPL  and  FUSL  methods  for  solving  strongly  convex  optimization  problems  and  show  that  they  can 
achieve  the  optimal  iteration  complexity  bounds. 

Thirdly,  we  introduce  a  simple  exact  approach  to  solve  the  only  subproblem  in  our  algorithms.  As 
mentioned  earlier,  the  accuracy  of  the  solutions  to  the  subproblems  is  essential  for  the  efficiency  of  all  these 
BL  type  methods  mentioned  in  Sections  1.1  and  1.2.  By  incorporating  the  proposed  exact  approach  to  solve 
the  only  subproblem  in  our  algorithm,  the  accuracy  of  the  FAPL  and  FUSL  methods  is  significantly  increased 
and  the  total  number  of  the  iterations  required  to  compute  an  e-solution  is  greatly  decreased  comparing 
with  the  original  APL  and  USL  methods  and  other  first  order  methods.  Also  when  the  number  of  linear 
constraints  is  fixed  and  small,  the  computational  cost  for  solving  the  only  subproblem  only  linearly  depends 
on  the  dimension  of  the  problem,  since  the  cost  for  computing  the  vector  inner  product  will  dominate  that  for 
solving  a  few  auxiliary  linear  systems.  This  feature  is  very  important  for  solving  large-scale  CP  problems. 

Finally,  we  present  very  promising  numerical  results  for  these  new  FAPL  and  FUSL  methods  applied 
to  solve  large-scale  least  square  problems  and  total-variation  based  image  reconstruction  problems.  These 
algorithms  significantly  outperform  other  BL  type  methods,  gradient  type  methods,  and  even  the  powerful 
MATLAB  solver  for  linear  systems,  especially  when  the  dimension  and/or  the  Lipschitz  constant  of  the  problem 
is  large. 

It  should  be  noted  that  there  exist  some  variants  of  bundle-level  methods  [4,  7,  2]  for  solving  nonsmooth 
CP  problems  in  which  the  computation  of  the  subproblem  to  update  the  lower  bound  is  skipped,  so  that 
the  feasible  set  X  is  allowed  to  be  unbounded.  In  each  iteration,  these  methods  apply  a  level  feasibility 
check  recursively  in  order  to  find  a  proper  level  Ik  and  the  associated  level  set  Xk,  and  update  to  Ik  if 
the  level  set  associated  with  Ik  is  empty.  However,  in  these  methods,  repeatedly  checking  the  emptiness  of 
level  sets  associated  with  varied  levels  in  each  iteration  may  be  very  costly  in  practice,  especially  when  the 
linear  constraints  in  the  level  sets  accumulate.  Also,  if  the  feasible  set  X  =  K",  then  for  any  chosen  level, 
the  corresponding  level  set  consisting  of  linear  constraints  is  unlikely  to  be  empty,  which  would  result  in  the 
inefficiency  for  updating  the  lower  bound.  In  [2],  an  alternative  for  updating  the  lower  bound  (or  increasing 
the  level)  is  introduced  by  comparing  the  distances  from  stability  center  to  the  newly  generated  iterate  and 
the  solution  set.  This  approach  requires  some  prior  knowledge  about  the  distance  to  the  solution  set,  and  a 
rough  estimate  for  that  may  lead  to  incorrect  lower  bounds  and  improper  choices  of  levels. 

1.4.  Organization  of  the  paper.  The  paper  is  organized  as  follows.  In  Section  2,  we  present  a  general 
scheme  to  extend  the  optimal  BL  type  methods  for  unconstrained  convex  optimization.  In  Section  3,  the 
new  FAPL  and  FUSL  methods  are  proposed  followed  by  their  convergence  analysis,  then  an  exact  approach 
is  introduced  to  solve  the  subproblem  in  these  algorithms.  We  also  extend  the  FAPL  and  FUSL  methods  to 
strongly  convex  CP  and  structured  CP  problems  in  Section  4,  following  some  unpublished  developments  by 
Lan  in  [15].  The  applications  and  promising  numerical  results  are  presented  in  Section  5. 

2.  Solving  unconstrained  CP  problems  through  ball-constrained  CP.  Our  goal  in  this  section 
is  to  present  a  generic  algorithmic  framework  to  extend  the  uniformly  optimal  constrained  BL  algorithms  in 
[14]  for  solving  unconstrained  problems. 

Given  x  S  K",  i?  >  0,  e  >  0,  let  us  assume  that  there  exists  a  first-order  algorithm,  denoted  by  A{x,  R,  e), 
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which  can  find  an  e-solution  of 


f^R--  (2-1) 

xeB(x,R) 

where  B{x,  R)  is  defined  in  (1.3).  In  other  words,  we  assume  that  each  call  to  A{x,  R,  e)  will  compute  a  point 
2;  £  B{x,R)  such  that  f{z)  —  <  e.  Moreover,  throughout  this  section,  we  assume  that  the  iteration 

complexity  associated  with  A(x,  R,  e)  is  given  by 

,,  C^{x,R,f)R-^  , 

Nx,R,e  ■=  +  gfe  (^.2) 

where  ai  >  /3i  >  0, 02  >  /?2  >  0  and  Ci{x,  R,  f),C2{x,  R,  f)  are  some  constants  that  depend  on  /  in  (2.1). 
For  example,  if  /  is  a  smooth  convex  function,  V/  is  Lipschitz  continuous  in  K”  with  constant  L,  i.e.,  (1.4) 
holds  with  p(M")  =  1  and  M(IR")  =  L,  and  we  apply  the  APL  method  to  (2.1),  then  we  have  only  one  term 
with  ai  =  1,  /3i  =  1/2,  and  Ci{x,  R,  f)  =  cL  in  (2.2),  where  c  is  a  universal  constant.  Observe  that  the 
two  complexity  terms  in  (2.2)  will  be  useful  for  analyzing  some  structured  CP  problems  in  Section  3.2.  It 
should  also  be  noted  that  a  more  accurate  estimate  of  Ci{x,  R,  /)  is  cM{B{x,  R)),  since  the  Lipschitz  constant 
L  =  M(M"')  throughout  K"  is  larger  than  or  equal  to  the  local  Lipschitz  constant  in  B{x,R). 

By  utilizing  the  aforementioned  ball-constrained  CP  algorithm  and  a  novel  guess  and  check  procedure,  we 
present  a  bundle-level  type  algorithm  for  unconstrained  convex  optimiations  as  follows. 


Algorithm  1  Bundle-level  type  methods  for  unconstrained  CP  problems 

Choose  initial  estimation  tq  <  ||a;  —  a:*||  and  compute  the  initial  gap  Ag  :=  f{x)  —  min2,gB(j h(x,x). 
For  /c  =  0, 1,  2, . . ., 

1.  Set  x'f.  =  A{x,  r/c,  Afc)  and  x'l  =  A{x,  2rfe,  Afc). 

2.  If  fix'f.)  —  fix'l.)  >  Afc,  update  2rfe  and  go  to  step  1. 

3.  Otherwise,  let  x*^  =  S/,  A^+i  =  Afc/2  and  =  r^. 


Step  1  and  Step  2  in  Algorithm  1  constitute  a  loop  to  find  a  pair  of  solution  [x'^.x'D  satisfying  0  < 
fi^'k)  -  fi'^k)  <  Since  x/  and  x'l  are  A^-optimal  solutions  to  min^(zB{x,rk)  fi^)  ^nd  T[nin^^B{x,2rk)  fix), 
respectively,  this  loop  must  terminate  in  finite  time,  because  it  will  terminate  whenever  rk  >  D* ,  where  D*  is 
defined  in  (1.2).  For  simplicity,  we  call  it  an  expansion  if  we  double  the  radius  in  Step  2,  and  each  iteration 
may  contain  several  expansions  before  updating  the  output  solution  xl  in  step  3. 

Before  analyzing  the  rate  of  convergence  for  Algorithm  1,  we  discuss  some  important  observations  related 
to  the  aforementioned  expansions. 

Lemma  2.1.  Suppose  that  x  is  a  fixed  point  and  R  >  0  is  a  fixed  constant.  Let  xi  and  X2  be  e-solutions 
to  problems 


ff  :=  min  f{x) 

xeB{x,R) 


and  /I  := 


x^B{x,2R) 


fix), 


(2.3) 


respectively.  If0<  fixi)  —  fix2)  <  e,  then  we  have 

f  2D*  \ 

fix2)-r<(3+—y,  (2.4) 

where  f*  and  D*  are  defined  in  (1.1)  and  (1.2)  respectively. 

Proof  Clearly,  by  definition,  we  have  \\xi  —  x||  <  R,  \\x2  —  S||  <  2ii,  0  <  f(xi)  —  ff  <  e,  and  0  < 
fix2)  —  fi  ^  It  suffices  to  consider  the  case  when  f^  >  f*  and  \\x*  —  x||  >  2R,  since  otherwise  (2.4) 
holds  trivially.  Suppose  x\  and  X2  are  the  solutions  to  the  first  and  second  problems  in  (2.3)  respectively, 
let  X  be  the  intersection  of  the  line  segment  (a::*,a;*)  with  the  ball  B{x,2R),  and  denote  i?i  :=  \\x  —  a;*||  and 
i?2  :=  ||a;*  —  Clearly,  x  =  (1  —  ^)x\  -\-  ^x*.  By  the  convexity  of  /(•),  we  have 


which  implies  that 


^[/(a^i)  -  fix*)]  <  fix*i)  -  fix), 


(2.6) 


and  that  f{x)  <  /(x*)  due  to  the  fact  that  fix*)  <  fixl).  Also,  we  have  f{x)  >  fix^)  since  x  €  B{x,  2R).  In 
addition, 


fix*l)  -  fiX2)  =  [fix*i)  -  fixi)]  +  [fixi)  -  f{x2)]  +  [fix2)  -  fix^)] 
^  0  -|-  6  “h  C  =  26. 

Combining  the  previous  inequalities,  we  obtain 

f{x*)  -  2e  <  fix^)  <  fix)  <  f{x*), 


(2.7) 

(2.8) 


(2.9) 


which  implies  that  /(x^)  —  f{x)  <  2e.  Using  the  previous  conclusion  (2.6),  and  the  fact  that  Ri  >  R  and 
R2  <  D*  +  R,  we  have 


fix*,) -fix*)  <^<(2+^ 


e. 


Therefore, 


fix2)  -  fix*)  <  fixi)  -  fix*)  <  [fixi)  -  fix*,)]  +  [fix*,)  -  fix*)]  <  3  + 


2D* 

~R 


e. 


We  are  now  ready  to  prove  the  iteration  complexity  of  Algorithm  1  for  solving  the  unconstrained  CP 
problem  in  (1.1). 

Theorem  2.2.  Suppose  that  the  number  of  evaluations  of  f  in  one  call  to  A(x,R,e)  is  bounded  by  (2.2), 
and  denote  Ck  '■=  fixi,)  —  f*  for  the  iterates  {x^}  of  Algorithm  1.  Then  we  have 

a)  rk  <  2D*  for  all  k; 

b)  limfc_,.oo  Cfe  =  0; 

c)  The  total  number  of  evaluations  of  f  performed  by  Algorithm  1  up  to  the  k-th  iteration  is  bounded  by 


O 


Ci(x,4U*,/)(U*)“i  ,  C2(x,4U*,/)(U*)“^ 


(2.10) 


where  D*  is  defined  in  (1.2). 

Proof.  We  start  by  proving  that  <  2D*  for  all  k.  From  the  description  of  Algorithm  1,  we  see  that 
expansions  occur  if  and  only  if  /(x),)— /(xj()  >  Ak  at  Step  2.  Moreover,  we  can  observe  that  /(x).)  — /(x)()  < 
if  X*  £  i3(x,  Tfc).  This  observation  implies  that  rk  <  2D*.  Indeed,  it  is  easy  to  see  that  the  total  number  of 
expansions  is  bounded  by 


+  1. 


To  prove  b),  noting  from  the  description  of  Step  3  and  Lemma  2.1  that 


efc  =  fK)  -f*< 


(2.11) 


(2.12) 


Since  the  total  number  of  expansions  is  bounded  by  ^i,  and  decreases  to  0  as  fc  increases,  we  have 
lim^—^oo  —  0- 
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To  prove  c),  assume  that  the  number  of  executions  of  Step  1  performed  by  A  to  find  xl  is  K.  Now  we 
can  estimate  the  total  number  of  evaluations  of  f  performed  by  the  K  executions.  For  any  Q  <  j  <  K,  as 
>  2^1  >  1  and  2“^  >  2^^  >  1,  we  have 


N'^+i  >  2^1  TVj  and  iV"+i  > 


(2.13) 


by  the  assumption  of  where  Nj  =  Nj  +  N”  denotes  the  bound  on  iteration  number  for  the 

execution,  and  Nj,N”  correspond  with  the  first  and  second  terms  in  (2.10)  respectively.  Then  the  total 
number  of  iterations  is  bounded  by 


K  K-l  K-l 

i=l  3=0  j=0 

+  O0  +00 


< 


< 


iV),^2-^^^+iV"^2 


-^23 


< 


1 


3=0 


3=0 


1  -  2-Pi 


N'k  + 


^  1  _  2-/32 


(l  +  2“0Ci(5J,2rfe,/)  ,  (1  +  2“^)C2(x,  2rfc, /)  rf 


1  -  2- A 

Combining  the  above  inequality  with  (2.12),  we  have 


1  -  2-/52 


Af^' 


(2.14) 

(2.15) 

(2.16) 


Ar  ^  (1  +  2“‘)C'i(a;,  2rfc,  /) 


(2.17) 


Since  ai  >  j3i  >  0,  then  r^’(3  +  q^)^‘  =  ^’(Sr^  +  2D*)^'  for  *  =  1,2  is  monotonically  increasing  with 

respect  to  r^,  which,  in  view  of  the  fact  <  2D*  for  any  A:  >  0,  then  clearly  implies 


(1  +  2“-)2“-+3/5.C,(x,4D*,/) 
^  '2/5. -1 

2=1 


(2.18) 


Hence  the  proof  is  complete.  □ 

Note  that  to  solve  the  unconstrained  black-box  CP  problem  (1.1),  the  termination  criterions  of  most  first- 
order  algorithms  are  based  on  the  residual  of  the  gradient  or  gradient  mapping,  which  would  lead  to  different 
complexity  analysis.  To  the  best  of  our  knowledge,  without  any  prior  information  on  D*,  there  is  no  any 
termination  criterion  based  on  functional  optimality  gap  that  could  guarantee  the  termination  of  algorithms 
for  finding  an  e-solution  of  (1.1).  Comparing  to  Nesterov’s  optimal  gradient  method  for  unconstrained  problems 
in  [19],  Algorithm  1  only  provides  efficiency  estimates  about  e^  :=  fix*/,)  —  f*  when  the  output  x^.  is  updated, 
while  the  optimal  gradient  method  could  have  estimates  about  Ck  ■=  f{xk)  —  f*  for  each  iterate  Xk-  For 
both  methods  the  efficiency  estimates  involve  D* .  Since  Algorithm  1  extend  methods  for  ball-constraint  CP 
problems  to  solve  (1.1),  and  the  iterations  in  the  expansions  of  Algorithm  1  could  be  regarded  as  a  guess 
and  check  procedure  to  determine  D*,  it  is  reasonable  that  the  efficiency  estimates  are  only  provided  for 
unexpansive  steps  which  update 

Since  in  [14]  the  APT  method,  and  its  variant  the  USL  method,  have  optimal  iteration  complexity  for 
solving  smooth,  nonsmooth,  weakly  smooth  CP  problems  and  structured  nonsmooth  CP  problems  on  compact 
feasible  sets,  Algorithm  1  could  be  incorporated  to  solve  (1.1)  by  specifying  feasible  sets  to  a  sequences  of 
Euclidean  balls.  Therefore,  the  main  problem  remained  is  how  to  improve  the  efficiency  of  these  BL  type 
methods  for  solving  ball-constrained  CP  problems. 

3.  Fast  prox-level  type  methods  for  ball-constrained  and  unconstrained  problems.  This  section 
contains  four  subsections.  We  first  present  a  much  simplified  APT  method,  referred  to  the  fast  APT  (FAPL) 
method,  for  solving  ball-constrained  black-box  CP  problems  in  Subsection  3.1,  and  then  present  the  fast  USL 
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(FUSL)  method  for  solving  a  special  class  of  ball-constrained  structured  CP  problems  in  Subsection  3.2.  We 
show  how  to  solve  the  subproblems  in  these  two  algorithms  in  Subsection  3.3.  We  also  briefly  discuss  in 
Subsection  3.4  the  applications  of  the  FAPL  and  FUSL  methods  for  unconstrained  optimization,  based  on 
our  results  in  Section  2.  For  the  sake  of  simplicity,  throughout  this  section,  we  denote  p  =  p{B{x,Rj),M  = 
M{B{x,R))  for  (2.1). 

3.1.  FAPL  for  ball-constrained  black-box  problems.  Our  goal  in  this  subsection  is  to  present  the 
FAPL  method,  which  can  significantly  reduce  the  iteration  cost  for  the  APL  method  applied  to  problem  (2.1). 
In  particular,  we  show  that  only  one  subproblem,  rather  than  two  subproblems  (see  (1.10)  and  (1.11))  as  in 
the  APL  method,  is  required  in  the  FAPL  method  for  defining  a  new  iterate  (or  prox-center)  and  updating 
lower  bound.  We  also  demonstrate  that  the  ball  constraint  in  (2.1)  can  be  eliminated  from  the  subproblem 
by  properly  specifying  the  prox-function. 

Similarly  to  the  APL  method,  the  FAPL  method  consists  of  multiple  phases,  and  in  each  phase  the  FAPL 
gap  reduction  procedure,  denoted  by  Qfapl^  is  called  to  reduce  the  gap  between  the  upper  and  lower  bounds 
on  /I  ^  in  (2.1)  by  a  constant  factor. 

We  start  by  describing  the  FAPL  gap  reduction  procedure  in  Procedure  1.  This  procedure  differs  from 
the  gap  reduction  procedure  used  in  the  APL  method  in  the  following  several  aspects.  Firstly,  the  feasible 
sets  Qi^  and  Qj.  (see  steps  1  and  4)  in  procedure  Gfapl  only  contain  linear  constraints  and  hence  are  possibly 
unbounded,  while  the  localizers  in  the  APL  method  must  be  compact.  Secondly,  we  eliminate  the  subproblem 
that  updates  the  lower  bound  on  /*  in  the  APL  method.  Instead,  in  the  FAPL  method,  the  lower  bound 
is  updated  to  I  directly  whenever  Qj^  =  0  or  Ijccfc  —  a;|j  >  R.  Thirdly,  we  choose  a  specific  prox-function 
d{x)  =  }^\\x  —  x\\^,  and  the  prox-center  is  fixed  to  be  x.  As  a  result,  all  the  three  sequences  {xk},{x^k}  and 
{a:^}  will  reside  in  the  ball  B{x,R).  At  last,  as  we  will  show  in  next  subsection,  since  the  subproblem  (3.4) 
only  contains  a  limited  number  of  linear  constraints  (depth  of  memory),  we  can  solve  it  very  efficiently,  or 
even  exactly  if  the  depth  of  memory  is  small,  say,  <  10. 

We  now  add  a  few  more  remarks  about  the  technical  details  of  Procedure  1.  Firstly,  Procedure  1  is 
terminated  at  step  2  if  =  0  or  \\xk  —  5;||  >  R,  which  can  be  checked  automatically  when  solving  the 
subproblem  (3.4)  (see  Subsection  3.3  for  more  details).  Secondly,  in  step  4,  while  Qk  can  be  any  polyhedral 
set  between  and  Q^.,  in  practice  we  can  simply  choose  Qk  to  be  the  intersection  of  the  half-space  {x  G  M”  : 
(xk  —  x,x  —  Xk)  >  0}  and  a  few  most  recently  generated  half-spaces,  each  of  which  is  defined  by  {a:  S  K”  : 
h{x\.,x)  <  1}  for  some  1  <t  <  k.  Finally,  in  order  to  guarantee  the  termination  of  procedure  Gfapl  and  the 
optimal  iteration  complexity,  the  parameters  {ak}  used  in  this  procedure  need  to  be  properly  chosen.  One 
set  of  conditions  that  {ak}  should  satisfy  to  guarantee  the  convergence  of  procedure  Gfapl  is  readily  given 
in  [14]: 


oi  =  1,  0  <  a/c  <  1,  7fc||Tfc(p)||^  <ck  2  ^  Vfc  >  1 

1-p 


for  some  constants  c  >  0,  where 


lip  denotes  the  Ip  norm, 


7/c  := 


1, 


k  =  1, 


7fc_i(l-Q;fc),  k>2, 


and  Tk{p)  := 


i+p  „i+P 


7i 


72 


J+P 


Ik 


(3.8) 


(3.9) 


The  following  lemma,  whose  proof  is  given  in  [14],  provides  two  examples  for  the  selection  of  {ak}- 
Lemma  3.1. 

a)  If  ak  =  2/(fc  -f  1),  fc  =  1,  2, . . .,  then  the  condition  (3.8)  is  satisfied  with  c  =  2^+^3  ^  . 

b)  If  {ak}  is  recursively  defined  by 


«!  =  7i  =  1:  “fc  =  (1  -  afc)7fc-i  =  7fc,  Vfc  >  2, 


(3.10) 


then  the  condition  (3.8)  holds  with  c  =  4/3  2^  . 

The  following  lemma  describes  some  important  observations  regarding  the  execution  of  procedure  G fapl  ■ 


Procedure  1  The  FAPL  gap  reduction  procedure:  (a:+,lb'^)  =  QFAPL{x,]b,R,x,  (3,9) 

0:  Set  fc  =  1,  /g  =  /(x),  ^  =  /3  •  lb  +  (1  —  /3)/o,  Qo  =  and  Xg  =  x.  Let  Xg  e  B{x,  R)  be  given  arbitrarily. 
1;  Update  the  cutting  plane  model: 


h{xi,x) 

Q-k 


(1  -  Q;fc)Xfc_i  +  akXk-i, 

(3.1) 

f{xi)-G{f{xi),x-xi), 

(3.2) 

{x  G  Qk-i  :  h{Xk,x)  <  1}. 

(3.3) 

2:  Update  the  prox-center  and  lower  bound: 


Xk  =  argmin^gQ^ 


If  Qfc  =  0  or  ||xfc  —  x||  >  i?,  then  terminate  with  outputs  x+  =  x^_;^,  Ib^  =  1. 
3:  Update  the  upper  bound:  set 


(3.4) 


Xfe  =  (1  -  afc)xfc_i  +  OfcXfe, 


X 


U 

k 


u 
k  ’ 


u 

k— 


if  /(ife)  <  fk, 

otherwise, 


and  fk  =  /(xfc).  If  /fc  <  /  +  0(/o  —  1),  then  terminate  with  x+  =  x^,  lb'*’  =  lb. 
4:  Choose  any  polyhedral  set  Qk  satisfying  Q_k  Qk  Qk^  where 


(3.5) 

(3.6) 


Qk  :=  {x  e  K"  :  {xk  —  x,x  —  Xk)  >  0}. 


(3.7) 


Set  k  =  k  +  \  and  go  to  step  1. 


Lemma  3.2.  Let  Sf{l)  :=  {x  G  B{x,R)  :  /(x)  <  1}.  If  £f{l)  ^  0,  then  the  following  statements  hold  for 
proeedure  Gfapl- 

a)  Step  4  'is  always  well-defined  unless  procedure  Gfapl  already  terminated. 

b)  £f{l)  ^  Qk  ^  Qk  Qk  for  any  k>l. 

c)  IfQk  7^  07  then  problem  (3.4)  in  step  2  has  a  unique  solution.  Moreover,  if  procedure  Gfapl  terminates 
at  step  2,  then  I  <  f* . 

Proof  To  prove  part  a),  we  will  use  induction  to  prove  that  £f{l)  C  Qk  for  all  A:  >  0.  Firstly,  as  Qq  is  set 
to  K",  we  have  £f{l)  C  Qq.  Moreover,  if  £f{l)  C  Qk-i  for  some  fc  >  1,  then  from  the  definition  of  Qk  in  (3.3) 
and  the  observation  that  /i(x(,,x)  <  /(x)  <  I  for  all  x  G  £f{l),  we  have  £f{l)  Q  Qk  f=  Qk,  hence  part  a)  holds. 

To  prove  b),  it  suffices  to  show  that  Qk  Q  Qk,  since  Qk  is  chosen  between  Qk  and  Qk,  and  £f{l)  C  Qk  is 
proved  from  the  above  induction.  By  the  definition  of  Qk  in  (3.7),  we  have  =  {x  S  K"  :  d{x)  >  d{xk)}, 
hence  Qk  C  Qk,  and  part  b)  holds. 

We  now  provide  the  proof  of  part  c).  From  the  definition  of  Qk  in  step  4  and  the  definition  of  Qk  in  (3.3) 
we  can  see  that  Qk  is  the  intersection  of  half-spaces,  hence  it  is  convex  and  closed.  Therefore,  the  subproblem 
(3.4)  always  has  a  unique  solution  as  long  as  Qk  is  non-empty. 

To  finish  the  proof  it  suffices  to  show  that  £f{l)  =  0  when  either  Qk  =  %  or  \\xk  —  x||  >  R,  which  can  be 
proved  by  contradiction.  Firstly,  if  Sfc  =  0  ^/(O  7^  07  fben  by  part  b)  proved  above,  we  have  £f{l)  C  Qk, 

which  contradicts  the  assumption  that  Qk  is  empty.  On  the  other  hand,  supposing  that  \\xk  —  x||  >  R  and 
£}{l)  7^  0,  let  x*F  :=  argmin„,^F{x, r)f{x),  it  is  clear  that  x^  G  ^/(O  ^  Qk  by  b),  however  ||x)j  —  x||  <  R  < 
\\xk  —  x||  which  contradicts  the  definition  of  Xk  in  (3.4).  □ 

The  following  lemma  shows  that  whenever  procedure  Gfapl  terminates,  the  gap  between  the  upper  and 
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lower  bounds  on  /|  is  reduced  by  a  constant  factor. 

Lemma  3.3.  Let  ub  :=  /(i:),ub’^  :=  /(a;+)  in  procedure  Gfapl-  Whenever  procedure  Gfapl  terminates, 
we  have  ub'*'  —  lb’*'  <  q(ub  —  lb),  where 


q  :=  inax{/3, 1  —  (1  —  0)f3}. 


(3.11) 


Proof.  By  the  definition  of  x'^  in  (3.5)  and  the  definition  of  f/.  in  step  3  of  procedure  Gfapl,  we  have 
fk  —  fk-i,  VA:  >  1,  which  implies  ub’*’  <  ub.  Procedure  Gfapl  could  terminate  at  either  step  2  or  3. 
We  first  suppose  that  it  terminates  at  step  2  after  k  iterations.  Using  the  termination  condition  lb’*’  =  I  = 
/3  •  lb  +  (1  —  /?)ub,  we  have 


ub^  —  lb'*’  <  ub  —  ^ lb  +  (1  —  /3)ub  =  /3(ub  —  lb). 

Now  suppose  that  Procedure  1  terminates  at  step  3  after  k  iterations.  We  have  ub^  =  /fc  <  ^  +  ^*(ub  —  /)  and 
lb'*’  >  lb.  Using  the  fact  that  Z  =  /3  •  lb  +  (1  —  /3)ub,  we  conclude  that 

ub+  -  lb+  <  Z  +  6»  (ub  -  Z)  -  lb  =  [1  -  (1  -  6»)^](ub  -  lb). 

We  conclude  the  lemma  by  combining  the  above  two  relations.  □ 

We  now  provide  a  bound  on  the  number  of  iterations  performed  by  procedure  Gfapl-  Note  that  the  proof 
of  this  result  is  similar  to  Theorem  3  in  [14]. 

Proposition  3.4.  If  the  stepsizes  {oLk}k>i  xre  chosen  such  that  (3.8)  holds,  then  the  number  of  iterations 
performed  by  proeedure  Gfapl  does  not  exceed 


N{A)  := 


cMR^^P  ^ 

{i  +  p)epA) 


1, 


(3.12) 


where  A  :=  ub  —  lb. 

Proof  It  can  be  easily  seen  from  the  definition  of  that  Xk  =  argmin^gQ^cZ(a:),  which,  in  view  of  the  fact 
that  Qk  C  Qfc,  then  implies  that  Xk  =  argmin,,.gQ^ cZ(x).  Using  this  observation,  and  the  fact  that  Xk+i  S  Qk 
due  to  (3.3)  and  (3.4),  we  have  {Vd(xk),Xk+i  —  Xk)  >  0.  Since  d(x)  is  strongly  convex  with  modulus  1,  we 
have 


d{,Xk+i)  >  d{xk)  +  {Vd{xk),Xk+i  -  Xk)  +  ^ll^fe+i  “  ^kW^- 

Combining  the  above  two  relations,  we  conclude  ^||xfc+i  —  Xk\\^  <  d{xk+i)  —  d{xk).  Summing  up  these 
inequalities  for  any  k>  1,  we  conclude  that 

k 

\  ^  \\xr+i  -  a;r||  <  d{xk+i)  =  ]^\\xk+i  -  (3.13) 

Now  suppose  that  procedure  Gfapl  does  not  terminate  at  the  iteration.  Applying  the  relation  (3.14)  in 
[14],  and  notice  that  a\  =  1,  we  have 

f(xl)  -  I  <  -^[2cZ(a;fe)]^7fc|lrfc(p)||^.  (3.14) 

I  +  p  1-'’ 

In  view  of  steps  2  and  3  in  procedure  1,  and  using  the  fact  that  Z  =  •  lb  +  (1  —  /3)ub  in  step  0,  we  have 

/(a;)))  -  Z  >  6»(ub  -  Z)  =  6»/3A. 
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Combining  the  above  two  relations,  and  using  (3.8)  and  (3.13),  we  obtain 


which  implies  that 


□ 


6I/3A  < 


MW-+P 

Ti+py 


c 

,  l+3p  I 
K  2 


k  < 


cMR^+P  \  1+"'’ 

{l  +  p)ei3A) 


(3.15) 


(3.16) 


In  view  of  Lemma  3.3  and  Proposition  3.4,  we  are  now  ready  to  describe  the  FAPL  method,  which  performs 
a  sequence  of  calls  to  procedure  Qfapl  until  an  approximate  solution  with  sufficient  accuracy  is  found. 


Algorithm  2  The  fast  accelerated  prox-level  (FAPL)  method 
0:  Given  ball  B{x,R),  choose  initial  point  po  G  B{x,R),  tolerance  e  >  0  and  parameters  P,0  G  (0, 1). 

1;  Set  Pi  G  Argmm^^gfj^  j^^h{po,x),  Ibi  =  /i(po,pi),  ubi  =  min{/(po), /(pi)},  let  Xi  be  either  po  or  pi  such 
that  f{xi)  =  ubi,  and  s  =  1. 

2:  If  ubg  —  lbs  <  e,  terminate  and  output  approximate  solution  Xg- 
3:  Set  (is+i,lbs+i)  =  gFAPLixs,Vos,R,x,/3,0)  and  ubs+i  =  f{xs+i). 

4:  Set  s  =  s  +  1  and  go  to  step  2. 


A  phase  of  the  FAPL  method  occurs  whenever  s  increments  by  1.  For  the  sake  of  simplicity,  each  iteration 
of  procedure  Qfapl  is  also  referred  to  as  an  iteration  of  the  FAPL  method.  The  following  theorem  establishes 
the  complexity  bounds  on  the  total  numbers  of  phases  and  iterations  performed  by  the  FAPL  method  and  its 
proof  is  similar  to  that  of  Theorem  4  in  [14] . 

Theorem  3.5.  If  the  stepsizes  {ak\  in  procedure  Qfapl  are  chosen  such  that  (3.8)  holds,  then  the 
following  statements  hold  for  the  FAPL  method. 

a)  The  number  of  phases  performed  by  the  FAPL  method  does  not  exceed 


S  := 


max 


0,  log! 

q 


l{2Rf+PM\\ 

I  (1  +  P)e  )} 


(3.17) 


b)  The  total  number  of  iterations  performed  by  the  FAPL  method  for  computing  an  e-solution  of  problem 
(2.1)  can  be  bounded  by 


N{e)  -.=  3  + 


1 


cMR^+P  \  1+"'’ 


I  —  qi+3p  \(l  +  p)0/3 


(3.18) 


where  q  is  defined  in  (3.11). 

Proof.  We  first  prove  part  a).  Let  Ag  :=  ubg  —  Ibg,  without  loss  of  generality,  we  assume  that  Ai  >  e.  In 
view  of  step  0  in  the  FAPL  method  and  (1.4),  we  have 


Ai  <  /(pi)  -  h{po,pi)  =  /(pi)  -  /(po)  -  {f{po),Pi  -  Po)  < 


{2R)'^+PM 

1  +  p 


(3.19) 


Also,  by  Lemma  3.3  we  can  see  that  Ag+i  <  qAg  for  any  s  >  1,  which  implies  that 


Ag+i  <  g®Ai,  Vs  >  0. 

Moreover,  if  an  e-solution  is  found  after  S  phases  of  the  FAPL  method,  then  we  have 


As  >  e>  As+i. 
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(3.20) 


Combining  the  above  three  inequalities,  we  conclude  that 


e  <  q 


^  ^Ai  <  q‘ 


s_,{2Ry+PM 
1  +  P  ’ 


(3.21) 


and  part  a)  follows  immediately  from  the  above  inequality.  We  are  now  ready  to  prove  part  b).  In  view  of 
Lemma  3.3  and  (3.20),  we  have  >  eq^~^ .  Using  this  estimate,  part  a),  and  Proposition  3.4,  we  conclude 
that  the  total  number  of  iterations  performed  by  the  FAPL  method  is  bounded  by 


s 

N{e)=Y,Ns 

S  =  1 


/  cMR^+P  \  1+"" 

W  +  pW) 


s 

E2{S  —  s') 

q  1+3P 

S  =  1 


<  ^  + 


1-gi 


cMR^+P  \  1+=*" 

{i  +  p)0/3e) 


(3.22) 


where  Ng  denotes  the  number  of  the  iterations  of  phase  s  for  1  <  s  <  S'.  □ 

In  view  of  Theorem  3.5,  the  FAPL  method  achieves  the  optimal  iteration  complexity  bounds  for  solving 
nonsmooth,  weakly  smooth,  and  smooth  CP  problems,  which  are  the  same  as  the  convergence  properties  of 
the  ABL  and  APL  method  (see  [18,  20]  for  the  discussions  on  the  complexity  theories  for  solving  CP  problems, 
and  [14]  for  the  convergence  properties  of  the  ABL  and  APL  methods). 

3.2.  FUSL  for  ball-constrained  structured  problems.  In  this  subsection,  we  still  consider  the  ball- 
constrained  problem  in  (2.1),  but  assume  that  its  objective  function  is  given  by 

/(x)  :=/(x)  +  F(a:),  (3.23) 

where  /  is  a  smooth  convex  function,  i.e.,  >  0  s.t. 

f{y)  -  f{x)  -  {V f{x),y  -  x)  <  (3.24) 

and 


F{x)  :=  max{(Aa:,?/)  -5(1/)}.  (3.25) 

y&Y 

Here,  Y  C  is  a  compact  convex  set,  g  ^Y  — >■  M  is  a  relatively  simple  convex  function,  and  A  :  K"  — >■  K™ 
is  a  linear  operator.  Our  goal  is  to  present  a  new  prox-level  method  for  solving  problem  (2.1)-(3.23),  which 
can  significantly  reduce  the  iteration  cost  of  the  USL  method  in  [14]. 

Generally,  the  function  F  given  by  (3.25)  is  non-smooth.  However,  in  an  important  work  [21],  Nesterov 
demonstrated  that  this  function  can  be  closely  approximated  by  a  class  of  smooth  convex  functions.  In 
particular,  letting  v  '.Y  — >■  K  be  a  prox-function  with  modulus  tr^  and  denoting  Cj,  :=  argminj,gyu(?/),  we  can 
approximate  F  in  (3.25)  by  the  smooth  function 

Fr,{x)  :  =  max{(Ax,y)  -  g{y)  -  pV {y)} ,  (3.26) 

yGY 

where  77  >  0  is  called  the  smoothing  parameter,  and  U(-)  is  the  Bregman  divergence  defined  by 

V{y)  :=  v{y)  -  u(c„)  -  (Vu(c„),y  -  c„) .  (3.27) 

It  was  shown  in  [21]  that  the  gradient  of  Fl,(-)  given  by  VF^(a;)  =  A*y*{x)  is  Lipschitz  continuous  with 
constant 


:=  \\Af/{pa,),  (3.28) 

where  jjAj|  is  the  operator  norm  of  A,  A*  is  the  adjoint  operator,  and  y*(x)  S  U  is  the  solution  to  the 
optimization  problem  in  (3.26).  Moreover,  the  “closeness”  of  Fjj(-)  to  F(-)  depends  linearly  on  the  smoothing 
parameter  77,  i.e., 

Frj(x)  <  F(x)  <  Frj(x)  +  pDy^Y,  Vx  €  A, 
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(3.29) 


where 


Dy^Y  ■=  max  {v{y)  -  v{z)  -  {\7v{z),  y-  z)}. 

y,z&Y 


(3.30) 


Therefore,  if  we  denote 


fvi^)  ■= 

(3.31) 

/r,(x)  <  /(x)  <  fr,{x)  -b  yDy^Y- 

(3.32) 

Applying  an  optimal  gradient  method  to  minimize  the  smooth  function  in  (3.31),  Nesterov  proves  in  [21]  that 
the  iteration  complexity  for  computing  an  e-solution  to  problem  (2.1)-(3.23)  is  bounded  by  0{\/e).  However, 
the  values  of  quite  a  few  problem  parameters,  such  as  ||A||,  and  are  required  for  the  implementation 

of  Nesterov’s  smoothing  scheme. 

By  incorporating  Nesterov’s  smoothing  technique  [21]  into  the  APT  method,  Lan  developed  in  [14]  a  new 
bundle-level  type  method,  namely  the  uniform  smoothing  level  (USL)  method,  to  solve  structured  problems 
given  in  the  form  of  (3.23).  While  the  USL  method  achieves  the  same  optimal  iteration  complexity  as 
Nesterov’s  smoothing  scheme  in  [21],  one  advantage  of  the  USL  method  over  Nesterov’s  smoothing  scheme 
is  that  the  smoothing  parameter  rj  is  adjusted  dynamically  during  the  execution,  and  an  estimate  of  Dy  y 
is  obtained  automatically,  which  makes  the  USL  method  problem  parameter  free.  However,  similar  to  the 
APL  method,  each  iteration  of  the  USL  method  involves  the  solutions  of  two  subproblems.  Based  on  the  USL 
method  in  [14]  and  our  analysis  of  the  FAPL  method  in  Section  3.1,  we  propose  a  fast  USL  (FUSL)  method 
that  solves  problem  (3.23)  with  the  same  optimal  iteration  complexity  as  the  USL  method,  but  requiring  only 
to  solve  one  simpler  subproblem  in  each  iteration. 

Similar  to  the  FAPL  method,  the  FUSL  method  consists  of  different  phases,  and  each  phase  calls  a  gap 
reduction  procedure,  denoted  by  Gfusl,  to  reduce  the  gap  between  the  upper  and  lower  bounds  on  ^  in 
(2.1)  by  a  constant  factor.  We  start  by  describing  procedure  Gfusl- 


Procedure  2  The  FUSL  gap  reduction  procedure:  (a:''",  Z1+,  lb'*')  =  Gfusl{x,  D,lh,  R,x,  /3,9) 

0;  Let  fc  =  1,  /g  =  f{x),  /  =  /3  •  lb  -|-  (1  —  /?)/o,  Qo  =  R",  Xq  =  x,  Xq  G  B{x,  R)  be  arbitrarily  given,  and 

V  ■=  d{7o  -  (3.33) 

1:  Update  the  cutting  plane  model:  set  x].  to  (3.1),  Q^.  to  (3.3),  and 

h{xi,x)  =  h^{x\,x)  =  fr,{xi)  +  {f^{xi),x-x[)  .  (3.34) 

2:  Update  the  prox-center:  set  Xk  to  (3.4).  If  =0  or  —  xjj  >  R,  then  terminate  with  output 
x~^  =  11+  =  D,  lb+  =  1. 

3;  Update  the  upper  bound  and  the  estimate  of  Dy^Y-  set  x^  to  (3.5),  x^  to  (3.6),  and  =  f{x^).  Check 
the  following  conditions: 

3a)  if  fix'^)  <  I  -I-  0(/g  —  1),  then  terminate  with  output  a;+  =  x'^,  Z1+  =  D,  lb+  =  lb. 

3b)  if  /(x^)  >  I  +  6>(/o  —  1)  and  /,,(x^)  <  I  +  |(/g  —  1),  then  terminate  with  output  x+  =  = 

2D,lb+  =  lb. 

4:  Choose  Qk  as  same  as  Step  4  in  Gfapl,  set  k  =  k  +  1,  and  go  to  step  1. 


A  few  remarks  about  procedure  Gfusl  are  in  place.  Firstly,  since  the  nonsmooth  objective  function  / 
is  replaced  by  its  smoothed  approximation  /^,  we  replace  the  cutting  plane  model  in  (1.8)  with  the  one  for 
fjj  (see  (3.34)).  Also  note  that  for  the  USL  method  in  [14],  /  is  assumed  to  be  a  simple  Lipschitz  continuous 
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convex  function,  and  only  Frj  is  approximated  by  the  linear  estimation.  However  in  the  FUSL  method,  we 
assume  /  is  general  smooth  convex,  and  linearize  both  /  and  in  (3.34).  Secondly,  the  smoothing  parameter 
77  is  specified  as  a  function  of  the  parameter  D,  /o  and  Z,  where  D  is  an  estimator  of  in  (3.30)  and  given 
as  an  input  parameter  to  procedure  Gfusl-  Thirdly,  same  to  the  FAPL  method,  the  parameters  {ofc}  are 
chosen  according  to  (3.8).  Such  conditions  are  required  to  guarantee  the  optimal  convergence  of  the  FUSL 
method  for  solving  problem  (2.1)-(3.23).  Fourthly,  similar  to  the  FAPL  method,  the  feasible  sets  Qk,Qk,Qk 
only  contains  a  limited  number  of  linear  constraints,  and  there  is  only  one  subproblem  (i.e.  (3.4))  involved  in 
procedure  Gfusl,  which  can  be  solved  exactly  when  the  depth  of  memory  is  small. 

The  following  lemma  provides  some  important  observations  about  procedure  Gfusl,  which  are  similar  to 
those  for  the  USL  gap  reduction  procedure  in  [14]. 

Lemma  3.6.  The  following  statements  hold  for  procedure  Gfusl- 

a)  If  this  procedure  terminates  at  steps  2  or  3a),  then  we  have  ub’*’  —  Ib^  <  q(ub  —  lb),  where  q  is  defined 
in  (3.11)  and  ub  :=  /g,ub''’  :=  f{x'^). 

b)  If  this  procedure  terminates  at  step  3b),  then  D  <  D^  y  and  D'^  <  2Dy^y. 

Proof.  The  proof  of  part  a)  is  as  the  same  as  that  of  Lemma  3.3,  and  we  only  show  part  b)  here.  By  the 
termination  condition  at  step  3b),  we  have  fix'):)  >  I  +  0(ub  —  1)  and  fnix'))  <  I  +  f (ub  —  1).  So, 

fixt)-  fvixt)  >  ^(ub-0- 

We  conclude  from  the  above  relation,  (3.32),  and  (3.33)  that 


p  ^  fix'))  -  Mx))  ^  einh  -1) 

V  277 

Finally,  £>+  <  2Dy  y  comes  immediately  from  the  above  relation  and  the  definition  of  £>+  in  step  3b).  □ 

The  following  results  provides  a  bound  on  the  number  of  iterations  performed  by  procedure  Gfusl- 
Proposition  3.7.  Suppose  that  {ak\k>i  in-  procedure  Gfusl  are  chosen  such  that  (3.8)  holds.  Then,  the 
number  of  iterations  performed  by  this  procedure  does  not  exceed 


N{A,D)  :=i? 


6»/3A 


0/3A  V  CTy 


(3.35) 


where  A  :=  f{x)  —  lb. 

Proof.  It  is  easy  to  see  that  the  gradient  of  in  (3.23)  has  Lipschitz  continuous  gradient  with  constant 
L  =  Lj  +  Ljj,  where  L,,  and  Lj,  are  defined  in  (3.28)  and  (3.24),  respectively.  Suppose  that  procedure  Gfusl 
does  not  terminate  at  step  k.  Noting  that  the  prox- function  dix)  in  procedure  Gfusl  has  modulus  1,  similarly 
to  the  discussion  on  (3.14),  we  have 


fvix))  -  I  < 


cLd{xk) 

P 


cLR^ 

-  2/c2  ’ 


(3.36) 


where  c  is  defined  in  (3.8),  and  the  second  inequality  is  from  (3.13).  Also,  since  procedure  Gfusl  does  not 
terminate,  in  view  of  the  termination  condition  at  step  3b)  and  the  definition  of  I  in  step  0,  we  have 

Mx))-l>^-^.  (3.37) 

Combining  the  above  two  relations,  and  noting  (3.28)  and  (3.33),  we  conclude  that 


k  < 


IcLR'^ 

0I3A 


<  R 


V2R\\A\\  IffD 
9/3A  V  Uy 


(3.38) 


□ 
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Algorithm  3  The  fast  uniform  smoothing  level  (FUSL)  method 
0:  Given  ball  B{x,R)^  choose  initial  point  po  €  B{x^R),  prox-function  u(-)  for  the  smoothing  function  F^i  in 
(3.26)  and  (3.27),  initial  guess  Di  on  the  size  in  (3.30),  tolerance  £  >  0,  and  parameters  /3,0  G  (0, 1). 
1;  Set  Pi  G  Argmin^g5(g,_fl)  h{po,x),  Ibi  =  ^(po,^!),  ubi  =  min{/(po), /(pi)},  let  xi  be  either  po  or  pi  such 
that  /(xi)  =  ubi,  and  s  =  1. 

2:  If  ubg  —  lbs  <  £)  terminate  and  output  approximate  solution  x. 

3:  Set  (xs+i,£>s+i,lbs+i)  =  C/_Fc/sL(xs,£>s,lbs,i?,x,/3, 0)  and  ubs+i  =  f{x). 

4:  Set  s  =  s  +  1  and  go  to  step  2. 


We  are  now  ready  to  describe  the  FUSL  method  which  iteratively  calls  procedure  Gfusl  to  solve  the 
structured  saddle  point  problem  (2.1)-(3.23). 

Similar  to  the  FAPL  method,  we  say  that  a  phase  of  the  FUSL  method  occurs  when  s  increases  by  1. 
More  specifically,  similar  to  the  USL  method,  we  classify  two  types  of  phases  in  the  FUSL  method.  A  phase 
is  called  significant  if  the  corresponding  Gfusl  procedure  terminates  at  steps  2  or  3a),  otherwise  it  is  called 
non-significant.  Clearly,  if  the  value  of  Dy^y  is  provided,  which  is  the  assumption  made  in  Nesterov’s  smoothing 
scheme  [21],  then  we  can  set  Di  =  Dy^y  in  the  scheme  of  both  the  original  and  modified  FUSL  method,  and 
consequently,  all  the  phases  of  both  the  original  and  modified  FUSL  methods  become  significant. 

For  the  sake  of  simplicity,  an  iteration  of  procedure  Gfusl  is  also  referred  to  an  iteration  of  the  FUSL 
method.  The  following  result  establishes  a  bound  on  the  total  number  of  iterations  performed  by  the  FUSL 
method  to  find  an  e-solution  of  problem  (2.1)-(3.23).  Note  that  the  proof  of  these  results  is  similar  to  that  of 
Theorem  7  in  [14]. 

Theorem  3.8.  Suppose  that  {ak}  in  procedure  Gfusl  «££  chosen  such  that  (3.8)  holds.  Then,  the  total 
number  of  iterations  performed  by  the  FUSL  method  for  computing  an  e-solution  of  problem  (2.1)-(3.23)  is 
bounded  by 


N{e) 


<51  -f  <5'2  -f  ( 
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(3.39) 


where  q  and  Dy  y  are  defined  in  (3.11)  and  (3.30)  respectively,  and 


D  :=  maxjUi,  2Dy  Y'\,  Si  :=  max 


Dyy 


and  S2 


logi 

<? 


AV2R\\A\\i/^+2R^Lj 

e 


(3.40) 


Proof.  We  prove  this  result  by  estimating  the  numbers  of  iterations  performed  within  both  non-significant 
and  significant  phases.  Suppose  that  the  set  of  indices  of  the  non-significant  and  significant  phases  are 
{mi,  m2, . . . ,  nisj}  and  jni,  77,2, . . . ,  ngj}  respectively.  For  any  non-significant  phase  mfc,  1  <  A:  <  Si,  we 
can  easily  see  from  step  3b)  that  Urnt+i  =  ,  by  part  b)  in  Lemma  3.6,  the  number  of  non-significant 

phases  performed  by  the  FUSL  method  is  bounded  by  Si  defined  above,  i.e.,  si  <  Si. 

In  addition,  since  <  D,  we  have  Dmi,  <  {1/2Y^~^D,  where  D  is  defined  above.  Combining  the 

above  estimates  on  si  and  Dmk^  ^'’^d  in  view  of  the  fact  Am*.  >  e  for  all  1  <  fc  <  si,  we  can  bound  the  number 
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of  iterations  performed  in  non-significant  phases  by 
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Applying  Lemma  8  in  [14]  and  relation  (3.24),  and  in  view  of  the  fact  that  po,pi  G  B{x,R)  in  Algorithm  3, 
the  initial  gap  is  bounded  as 


Ai  :=  ubi  -  Ibi  <  [F{po)  -  F{pi)  -  {F'{pi),po  -  Pi)]  +  [/(po)  -  f{pi)  -  (^f{pi),Po-Pi 

<  472i?||  Ajj  +  2i?^L;, 


(3.42) 

(3.43) 


where  F'{pi)  G  dF{pi).  Then  for  significant  phases,  similarly  to  the  proof  of  Theorem  3.5,  we  have  S2  <  S2- 
Moreover,  for  any  Uk,  1  <  fc  <  S2,  using  Lemmas  3.3,  3.6,  we  have  <  D,  <  qAn^,  and  A„^^  >  e, 

which  implies  A„,,  >  Combining  such  an  estimate  on  Zl„^,  A„^  and  bound  on  S2,  we  can  see  that 

the  total  number  of  iterations  performed  the  significant  phases  is  bounded  by 


A2  =£  A(A„,,74„J  <  £  A(e/7^-^7) 
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(3.44) 


Finally,  the  total  number  of  iterations  performed  by  the  FUSL  method  is  bounded  by  Ni  +  N2,  and  thus 
(3.39)  holds.  □ 

From  (3.39)  in  the  above  theorem,  we  can  see  that  the  iteration  complexity  of  the  FUSL  method  for 
solving  problem  (2.1)-(3.23)  is  bounded  by 


O 


(3.45) 


The  above  iteration  complexity  is  the  same  as  that  of  the  Nesterov  smoothing  scheme  in  [21]  and  the  USL 
method  in  [14].  However,  both  the  USL  and  FUSL  methods  improve  Nesterov’s  smoothing  scheme  in  that 
both  of  them  are  problem  parameter  free.  In  addition,  as  detailed  in  Subsection  3.3  below,  the  FUSL  method 
further  improves  the  USL  method  by  significantly  reducing  its  iteration  cost  and  improving  the  accuracy  for 
solving  its  subproblems. 

3.3.  Solving  the  subproblems  of  FAPL  and  FUSL.  In  this  section,  we  introduce  an  efficient  method 
to  solve  the  subproblems  (3.4)  in  the  FAPL  and  FUSL  methods,  which  are  given  in  the  form  of 

c  :=argmin,^gQi||a:-p|l^. 
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x] 


(3.46) 


Here,  Q  is  a  closed  polyhedral  set  described  by  m  linear  inequalities,  i.e. 


Q  :=  {cc  e  K"  :  {Ai,  x)  <  bi,  i  =  1,2, . . . ,  m}. 
Now  let  us  examine  the  Lagrange  dual  of  (3.46)  given  by 


max  min  -lla;  —  nil 
A>0  kGR"  2 


+  Ai  [{Ai ,  x)  —  hi\ . 


(3.47) 


It  can  be  checked  from  the  theorem  of  alternatives  that  problem  (3.47)  is  solvable  if  and  only  if  Q  7^  0.  Indeed, 
if  Q  7^  0,  it  is  obvious  that  the  optimal  value  of  (3.47)  is  finite.  On  the  other  hand,  if  Q  =  0,  then  there 
exists  A  >  0  such  that  A^H  =  0  and  A^6  <  0,  which  implies  that  the  optimal  value  of  (3.47)  goes  to  infinity. 
Moreover,  if  (3.47)  is  solvable  and  A*  is  one  of  its  optimal  dual  solutions,  then 


xI=p-Y.KA,.  (3.48) 

i=l 


It  can  also  be  easily  seen  that  (3.47)  is  equivalent  to 

max  — -A^MA  +  C^A, 
A>0  2 


(3.49) 


where  Mij  :=  {Ai,  Aj) ,  Ci  :=  {Ai,p)  —  bi,  Vz,  j  =  1,2, ...  ,m.  Hence,  we  can  determine  the  feasibility  of  (3.46) 
or  compute  its  optimal  solution  by  solving  the  relatively  simple  problem  in  (3.49). 

Many  algorithms  are  capable  of  solving  the  above  nonnegative  quadratic  programming  in  (3.49)  efficiently. 
Due  to  its  low  dimension  (usually  less  than  10  in  our  practice),  we  propose  a  brute-force  method  to  compute 
the  exact  solution  of  this  problem.  Consider  the  Lagrange  dual  associated  with  (3.49): 

minmax£(A,  p)  :=  M\  —  (C^  -|-  ^)A, 

A>0  /i>0  2 


where  the  dual  variable  is  p  :=  (/ii,/i2, . . .  ,/im).  Applying  the  KKT  condition,  we  can  see  that  A*  >  0  is  a 
solution  of  problem  (3.49)  if  and  only  if  there  exists  p*  >0  such  that 

Va/:(A*,^*)  =  0  and  {\,n)  =  0.  (3.50) 


Note  that  the  first  identity  in  (3.50)  is  equivalent  to  a  linear  system: 


(M 


Ai\ 

(b,\ 

Am 

Pi 

\bj 

\Pm/ 

(3.51) 


where  /  is  the  m  x  m  identity  matrix.  The  above  linear  system  has  2m  variables  and  m  equations.  But  for 
any  z  =  1, . . . ,  m,  we  have  either  A^  =  0  or  =  0,  and  hence  we  only  need  to  consider  2"*  possible  cases  on 
the  non-negativity  of  these  variables.  Since  m  is  rather  small  in  practice,  it  is  possible  to  exhaust  all  these 
2™  cases  to  find  the  exact  solution  to  (3.50).  For  each  case,  we  first  remove  the  m  columns  in  the  matrix 
(M  —  I)  which  correspond  to  the  m  variables  assumed  to  be  0,  and  then  solve  the  remaining  determined  linear 
system.  If  all  variables  of  the  computed  solution  are  non-negative,  then  solution  {X*,p*)  to  (3.50)  is  found, 
and  the  exact  solution  x*  to  (3.46)  is  computed  by  (3.48),  otherwise,  we  continue  to  examine  the  next  case. 
It  is  interesting  to  observe  that  these  different  cases  can  also  be  considered  in  parallel  to  take  the  advantages 
of  high  performance  computing  techniques. 
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3.4.  Extending  FAPL  and  FUSE  for  unconstrained  problems.  In  this  subsection,  we  study  how 
to  utilize  the  FAPL  and  FUSL  method  to  solve  the  unconstrained  problems  based  on  our  results  in  Section  2. 

Let  us  first  consider  the  case  when  /  in  (1.1)  satisfies  (1.4).  If  the  method  A  in  step  1  of  Algorithm  1 
is  given  by  the  FAPL  method,  then  by  Theorem  3.5,  the  number  of  evaluations  of  f'  within  one  call  to 
A(x,2rk,  Ak)  is  bounded  by 


cMk(2ny+P'‘ 


2 

l+3pfc 


(1  +  pk)0/3Ak 


(3.52) 


where  c  is  a  universal  constant,  Mk  '■=  M{B{x,2rk))  and  :=  p{B{x,2rk))  are  constants  corresponding  to 
the  assumption  in  (1.4).  By  Theorem  2.2,  the  number  of  evaluations  of  /'  up  to  the  k-th  iteration  of  Algorithm 
1  is  bounded  by 


O 


M{AD*)^+P 


(3.53) 


where  M  :=  M{B{xAD*)),  P  '■=  p{B{x,4:D*))  and  :=  f{xk)  —  f*  is  the  accuracy  of  the  solution.  It  should 
be  noted  that  the  constants  M  and  p  are  local  constants  that  depend  on  the  distance  from  x  and  x*,  which 
are  not  required  for  the  FAPL  method  and  Algorithm  1,  and  also  generally  smaller  than  the  constants  M(M") 
and  p(K"),  respectively,  for  the  global  Holder  continuity  condition. 

Moreover,  if  /  in  (1.1)  is  given  in  the  form  of  (3.23)  as  a  structured  nonsmooth  CP  problem,  then  the 
FUSL  method  could  be  applied  to  solve  the  corresponding  structured  ball-constraint  problem  in  Algorithm  1. 
By  Theorem  3.8,  the  number  of  evaluations  of  f'  within  one  call  to  A(x,  2r/j,  Aj,)  is  bounded  by 


Si +S2  +  2rkC' 


2r,C''\\A\\ 

Ak 


(3.54) 


where  C ,C"  are  some  constants  depending  on  the  parameters  q,9,  f5,av,  Dq  and  Dy  y  in  the  FUSL  method. 

Applying  Theorem  2.2  with  ai  =  a2  =  1,  /3i  =  \,  P2  =  1,  Ci{x,R,f)  =  2C' ^JJTp  and  C2(x,R,f)  = 
2C'"||A||,  the  number  of  evaluations  of  /'  up  to  the  k-th  iteration  of  Algorithm  1  is  bounded  by 


O 


4C'"U*||A||\ 


(3.55) 


Similar  to  the  FAPL  method,  here  Lf  :=  Lf{B{x,4:D*))  is  a  lower  bound  of  LflMA). 

4.  Generalization  to  strongly  convex  optimization.  In  this  section,  we  generalize  the  FAPL  and 
FUSL  methods  for  solving  convex  optimization  problems  in  the  form  of  (1.1)  whose  objective  function  / 
satisfies 

fiy)  -  fix)  -  {f'{x),y-x)  >  ^\\y -xf,  Va;,?/eK",  (4.1) 

for  some  /i  >  0.  For  the  sake  of  simplicity,  we  assume  throughout  this  section  that  an  initial  lower  bound 
Ibo  <  /*  is  available^.  Under  this  assumption,  it  follows  from  (4.1)  that  ||po  —  a^*|p  <  2[f{po)  —  lbo]/p  for 
a  given  initial  point  po,  and  hence  that  the  FAPL  and  FUSL  methods  for  ball-constrained  problems  can  be 
directly  applied.  However,  since  the  lower  and  upper  bounds  on  f*  are  constantly  improved  in  each  phase 
of  these  algorithms,  we  can  shrink  the  ball  constraints  by  a  constant  factor  once  every  phase  accordingly. 
We  show  that  the  rate  of  convergence  of  the  FAPL  and  FUSL  methods  can  be  significantly  improved  in  this 
manner. 

^Otherwise,  we  should  incorporate  a  guess-and-check  procedure  similar  to  the  one  in  Section  2. 
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We  first  present  a  modified  FAPL  method  for  solving  black-box  CP  problems  which  satisfy  both  (1.4)  and 
(4.1).  More  specihcally,  we  modify  the  ball  constraints  used  in  the  FAPL  method  by  shifting  the  prox-center 
X  and  shrinking  the  radius  R  in  procedure  Qfapl-  Clearly,  such  a  modification  does  not  incur  any  extra 
computational  cost.  This  algorithm  is  formally  described  as  follows. 


Procedure  3  The  modified  FAPL  gap  reduction  procedure:  (a:+,  lb'*’)  =  QFAPL{x,Vo,r,  j3,9) 

In  Procedure  1,  set  x  =  x,  and  consequently  the  prox-function  d  in  (3.4)  is  replaced  by  ||a;  —  i|P/2. 


Algorithm  4  The  modified  FAPL  method  for  minimizing  strongly  convex  functions 
In  Algorithm  2,  change  steps  0,  1  and  3  to 

0;  Choose  initial  lower  bound  Ibi  <  /*,  initial  point  po  S  K",  initial  upper  bound  ubi  =  f{po),  tolerance 
e  >  0  and  parameters  (3,0  G  (0, 1). 

1:  Set  xi  =  po)  and  s  =  1. 

3:  Set  (is+i,lbs+i)  =  ^FAPL(is,  lbs,  a/2(/(xs)  -  lbs)//i,/3,6>)  and  ub^+i  =  f{xs+i). 


A  few  remarks  on  the  above  modihed  FAPL  method  are  in  place.  Firstly,  let  x*  be  the  optimal  solution 
of  problem  (1.1)  and  define  Ag  =  ubg  —  Ibg.  By  the  definition  of  ubg  and  lbs,  we  have  /(is)  —  f{x*)  <  As, 
which,  in  view  of  (4.1),  then  implies  that 


||is-a:*f  <  — =:r2,  (4.2) 

d 

and  X*  £  B{xs,r).  Secondly,  similar  to  procedure  Qfapl,  if  procedure  Qfapl  terminates  at  step  2,  we  have 
£f{l)  n  B{xs,r)  =  0.  Combining  this  with  the  fact  x*  £  B{xs,r),  we  conclude  that  I  is  a  valid  lower  bound 
on  /*.  Therefore,  no  matter  whether  procedure  Qfapl  terminates  at  step  2  or  step  4,  the  gap  between  upper 
and  lower  bounds  on  /*  has  been  reduced  and  As+i  <  gAs,  where  the  q  is  defined  in  (3.11). 

We  establish  in  Theorem  4.1  the  iteration  complexity  bounds  of  the  modihed  FAPL  method  for  minimizing 
strongly  convex  functions. 

Theorem  4.1.  Suppose  that  {ak\k>i  in  procedure  Qfapl  are  chosen  such  that  (3.8)  holds.  Then  the 
total  number  of  iterations  performed  by  the  modified  FAPL  method  for  computing  an  e-solution  of  problem 
(1.1)  is  bounded  by 
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respectively,  for  smooth  strongly  convex  functions  (i.e.,  p  =  \)  and  nonsmooth  or  weakly  smooth  strongly 
convex  functions  (i.e.,  p  £  [0,  l)j,  where  q  is  defined  in  (3.11),  Ibi  and  ubi  are  given  initial  lower  bound  and 
upper  hound  on  f*,  and 


S:= 


(4.3) 


Proof  Suppose  that  procedure  Qfapl  does  not  terminate  at  the 
(3.14)  and  (4.2)  that 


inner  iteration.  It  then  follows  from 
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1  +  p 
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(4.4) 


Moreover,  in  view  of  the  termination  condition  at  steps  3  and  relation  (4.2),  we  have  f{x^)  —  I  >  0(ubs  —  1)  = 
9(3As  and  r  =  ^2As/ /i.  Combining  all  the  above  observations  we  conclude  that 


k  < 


l  +  P 

2  2  cM 


l  +  3p 


V6»/3(l +  J 

So  the  number  of  inner  iterations  performed  in  each  call  to  procedure  Gfapl  is  bounded  by 


(4.5) 
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(4.6) 


Since  the  gap  between  the  upper  and  lower  bounds  on  /*  is  reduced  by  a  constant  factor  in  each  phase,  i.e., 
^s+i  <  qAs,  it  easy  to  see  that  the  total  number  of  phases  is  bounded  by  S  defined  above.  Using  the  previous 
two  conclusions  and  the  fact  that  A^  >  €/q^~^ ,  we  can  show  that  the  total  number  of  iterations  performed  by 
the  modified  FAPL  method  is  bounded  by 


~  f  2-^cM 
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Specifically,  if  /  is  smooth  [p  =  1),  then  the  above  bound  is  reduced  to 
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If  /  is  nonsmooth  (p  =  0)  or  weakly  smooth  {p  G  (0, 1)),  then  the  above  bound  is  equivalent  to 
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Now  let  us  consider  the  structured  CP  problems  with  /  given  by  (3.23),  where  the  smooth  component 
/  is  strongly  convex  with  modulus  p.  Similar  to  the  modified  FAPL  method,  we  present  a  modified  FUSL 
method  for  solving  this  strongly  convex  structured  CP  problems  as  follows. 


Procedure  4  The  modified  FUSL  gap  reduction  procedure:  (a:+,  H'*',  lb'*')  =  Gfapl{x,  D,\h,r,  (3,9) 
In  Procedure  2,  set  x  =  x,  and  consequently  the  prox-function  d  is  replaced  by  ||a;  —  5|P/2. 


Algorithm  5  The  modified  FUSL  method  for  minimizing  strongly  convex  functions 
In  Algorithm  3,  change  steps  0,1  and  3  to 

0;  Choose  initial  lower  bound  Ibi  <  /*,  initial  point  pq  G  K",  initial  upper  bound  ubi  =  f{po),  prox-function 
v{-),  initial  guess  Di  on  the  size  D^  y,  tolerance  e  >  0  and  parameters  (3,9  G  (0, 1). 

1:  Set  xi  =  Pq,  and  s  =  1. 

3:  Set  (xs+i,£>5+i,lbs+i)  =  Qfusl{xs,Ds, \hs,  ^J2{f{xs)  -  \hs)/p,l3,9)  and  ubs+i  =  f{xs+i). 


In  the  following  theorem,  we  describe  the  convergence  properties  of  the  modified  FUSL  method  for  solving 
(l.l)-(3.23)  with  strongly  convex  smooth  component  /. 

Theorem  4.2.  Suppose  that  {afc}fc>i  in  procedure  Gfusl  are  chosen  such  that  (3.8)  holds.  Then  we 
have  the  following  statements  hold  for  the  modified  FUSL  method. 
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a)  The  total  number  of  iterations  performed  by  the  modified  FUSL  method  for  computing  an  e-solution 
of  problem  (l.l)-(3.23)  is  bounded  by 


{Si  +  S) 


,  m\\^ 

^/3(l  -  Vq)\ 


(4.10) 


where  q  is  defined  in  (3.8),  and  D  are  defined  in  (3.40),  and  S  is  defined  in  (4.3). 
b)  In  particular,  if  Dy^y  is  known,  and  set  Di  =  Dy^y  at  Step  0,  then  the  number  of  iterations  performed 
by  the  modified  FUSL  method  is  reduced  to 


N{e)  :=  S 
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Proof.  Similarly  to  the  discussion  in  Theorem  3.8,  we  classify  the  non-significant  and  significant  phases 
and  estimates  the  numbers  of  iterations  performed  by  each  type  of  phases.  Suppose  that  the  set  of  indices  of 
the  non-significant  and  significant  phases  are  {mi,  m2, . . . ,  m^^ }  and  {ni,  712, . . . ,  rig^}  respectively.  Then  the 
number  of  nonsignificant  phases  i^bounded  by  Si,  i.e.,  si  <  Si.  And  since  Ai  =  ubi  —  Ibi,  so  the  number  of 
significant  phases  is  bounded  by  S  defined  above,  i.e.,  S2  <  S2. 

In  view  of  Proposition  3.7,  and  substitute  r  =  y  we  have  for  any  phase 


N{A,D)  := 


1 2cL  ^ 

9/3p 


cD 


2||4l||  _ 

9(3  y  ayfiA 


1. 


(4.12) 


Following  similar  discussion  in  Theorem  3.8,  we  have  the  number  of  iterations  performed  by  non-significant 
phases  in  the  modified  FUSL  method  is  bounded  by 


Ni  <^N{e,D/r^-^) 


k=l 


<Si 


<Si 


k=l 


l2cL^^ 

9(3fj, 


+  1  + 


2|1A||  /  cD  ^  s,-k 


ayfj.e 


1 2cLj 
dfdpL 


+  1  + 


Si 

k=l 

I  cD 

0(3{l  -  i/q)  V  CTyfie' 


2||4l|| 


(4.13) 

(4.14) 

(4.15) 


And  the  bound  on  number  of  iterations  performed  by  all  significant  phases  is  given  by 


k=l 


<  S 


<  S 


l2cLf 

9Pp 


i2cLj 

9/3p 


Y,N{e/q^--\b) 

(4.16) 

k^l 

2||A|1  /  cD  Yq^ 

9P  V  cType^^^ 

(4.17) 

2||A||  /  cD 

0/3(1  -  y/q)  V 

(4.18) 

Therefore,  the  total  number  of  iterations  is  bounded  by  Ni  N2,  and  thus  part  a)  holds. 
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For  part  b),  in  view  of  Lemma  3.6,  we  can  see  that  if  Di  =  Dy^y,  then  Dg  =  Dy^y  for  all  s  >  1,  and  all 
phases  of  the  modified  FUSL  method  are  significant.  Therefore,  replace  and  D  in  (4.16),  we  can  conclude 
part  b)  holds.  □ 

In  view  of  the  above  Theorem  4.2,  we  can  see  that  the  iteration  complexity  of  the  modified  FUSL  method 
for  solving  the  structured  CP  problem  (3.23)  is  bounded  by  O  (||A||/-ye) . 

5.  Numerical  experiments.  In  this  section  we  present  our  experimental  results  of  solving  a  few  large- 
scale  CP  problems,  including  the  quadratic  programming  problems  with  large  Lipschitz  constants,  and  two 
different  types  of  variation  based  image  reconstruction  problems,  using  the  FAPL  and  FUSL  methods,  and 
compare  them  with  some  other  first-order  algorithms.  All  the  algorithms  were  implemented  in  MATLAB, 
Version  R2011a  and  all  experiments  were  performed  on  a  desktop  with  an  Inter  Dual  Core  2  Duo  3.3  GHz 
CPU  and  8G  memory. 

5.1.  Quadratic  programming.  The  main  purpose  of  this  section  is  to  investigate  the  performance  of 
the  FAPL  method  for  solving  smooth  CP  problems  especially  with  large  Lipschitz  constants.  For  this  purpose, 
we  consider  the  quadratic  programming  problem: 

niin  \\Ax-b\\^,  (5.1) 

lkll<i 

where  A  £  ^  ^  compare  the  FAPL  method  with  Nesterov’s  optimal  method  (NEST)  for 

smooth  functions  [21],  NERML  [3],  and  APL  [14].  We  also  compare  the  FAPL  method  with  the  the  built-in 
Matlab  linear  system  solver  in  view  of  its  good  practical  performance.  In  the  APL  method,  the  subproblems 
are  solved  by  MOSEK  [17],  an  efficient  software  package  for  linear  and  second-order  cone  programming.  Two 
cases  with  different  choices  of  the  initial  lower  bound  LB  in  this  experiments  are  conducted:  (1).  LB  =  0  and 
(2).  LB  =  -oo. 

In  our  experiments,  given  m  and  n,  two  types  of  matrix  A  are  generated.  The  first  type  of  matrix  A 
is  randomly  generated  with  entries  uniformly  distributed  in  [0,1],  while  the  entries  of  the  second  type  are 
normally  distributed  according  to  N{0, 1).  We  then  randomly  choose  an  optimal  solution  x*  within  the  unit 
ball  in  K",  and  generate  the  data  b  hy  b  =  Ax*.  We  apply  all  the  four  methods  to  solve  (5.1)  with  this  set  of 
data  A  and  b,  and  the  accuracy  of  the  generated  solutions  are  measured  by  et  =  \\Axk  —  b\\^.  The  results  are 
shown  in  Tables  5.1,  5.2  and  5.3. 

The  advantages  of  the  FAPL  method  can  be  observed  from  these  experiments.  Firstly,  it  is  evident  that 
BL  type  methods  have  much  less  iterations  than  NEST  especially  when  the  Lipschitz  constant  is  large.  Among 
these  three  BL  type  methods,  NERML  requires  much  more  iterations  than  APL  and  FAPL,  which  have  optimal 
iteration  complexity  for  this  problem. 

Secondly,  compared  with  previous  BL  type  methods  (APL  and  NERML),  FAPL  has  much  lower  compu¬ 
tational  cost  for  each  iteration.  The  computational  cost  of  FAPL  method  for  each  iteration  is  just  slightly 
larger  than  that  of  NEST  method.  However,  the  cost  of  each  iteration  of  APL  and  NERML  is  10  times  larger 
than  that  of  NEST. 

Thirdly,  consider  the  difference  of  performance  for  setting  the  lower  bound  to  be  0  and  —  oo,  it  is  also 
evident  that  FAPL  method  is  more  robust  to  the  choice  of  the  initial  lower  bound  and  it  updates  the  lower 
bound  more  efficiently  than  the  other  two  BL  methods.  Though  setting  the  lower  bound  to  — oo  increases 
number  of  iterations  for  all  the  three  BL  method,  a  close  examination  reveals  that  the  difference  between 
setting  the  lower  bound  to  zero  and  — oo  for  FAPL  method  is  not  so  significant  as  that  for  APL  and  NERML 
methods,  especially  for  large  matrix,  for  example,  the  second  one  in  Table  5.1  . 

Fourthly,  FAPL  needs  less  number  of  iterations  than  APL,  especially  when  the  required  accuracy  is  high.  A 
plausible  explanation  is  that  exactly  solving  the  subproblems  provides  better  updating  for  the  prox-centers,  and 
consequently,  more  accurate  prox-centers  improve  the  efficiency  of  algorithm  significantly.  The  experiments 
show  that,  for  APL  and  NERML,  it  is  hard  to  improve  the  accuracy  beyond  10“^°.  However,  FAPL  can  keep 
almost  the  same  speed  for  deceasing  the  objective  value  from  10®  to  10“^^. 

Finally,  we  can  clearly  see  from  Table  5.3  that  FAPL  is  comparable  to  or  significantly  outperform  the  built- 
in  Matlab  solver  for  randomly  generated  linear  systems,  even  though  our  code  is  implemented  in  MATLAB 
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rather  than  lower-level  languages,  such  as  C  or  FORTRAN.  We  can  expect  that  the  efficiency  of  FAPL  will 
be  much  improved  by  using  C  or  FORTRAN  implementation,  which  has  been  used  in  the  MATLAB  solver 
for  linear  systems. 

In  summary,  due  to  its  low  iteration  cost  and  effective  usage  of  the  memory  of  first-order  information,  the 
FAPL  method  is  a  powerful  tool  for  solving  smooth  CP  problems  especially  when  the  number  of  variables  is 
huge  and/or  the  value  of  Lipschitz  constant  is  large. 

5.2.  Total-variation  based  image  reconstruction.  In  this  subsection,  we  apply  the  FUSL  method 
to  solve  the  non-smooth  total-variation  (TV)  based  image  reconstruction  problem: 

min  ^IIAm- 6||^ -h  A||u||tv,  (5.2) 

iteR"  2 

where  A  is  a  given  matrix,  u  is  the  vector  form  of  the  image  to  be  reconstructed,  b  represents  the  observed 
data,  and  ||  •  \\tv  is  the  discrete  TV  semi- norm  defined  by 

N 

I|m||tv  :=  XI  (5-3) 

i=l 

where  DiU  €  is  a  discrete  gradient  (finite  differences  along  the  coordinate  directions)  of  the  z-th  component 
of  u,  and  N  is  the  number  of  pixels  in  the  image.  The  ||zz||Ty  is  convex  and  non-smooth. 


Table  5.1 

Uniformly  distributed  QP  instances 


A  :  n  =  4000,  m  =  3000,  L  =  2.0e6,  cq  =  2.89e4 

Alg 

LB 

Iter.  Time  Acc. 

Iter.  Time  Acc. 

FAPL 

0 

—  CX) 

103  3.06  9.47e-7 

277  6.55  5.78e-7 

142  3.76  8.65e-9 

800  19.18  2.24e-ll 

APL 

0 

— oo 

128  37.10  9.07e-7 

300  85.65  6.63e-6 

210  60.85  9.82e-9 

800  234.69  2.59e-9 

NERML 

0 

— oo 

218  58.32  9.06e-7 

300  84.01  I.02e-2 

500  134.62  1.63e-8 

800  232.14  1.71e-3 

NEST 

- 

10000  220.1  3.88e-5 

20000  440.02  3.93e-6 

A  :  n  =  8000,  m  =  4000,  L  =  8.0e6,  cq  =  6.93e4 

Alg 

LB 

Iter.  Time  Acc. 

Iter.  Time  Acc. 

FAPL 

0 

— oo 

70  4.67  7.74e-7 

149  8.99  6.27e-7 

95  6.46  6.85e-10 

276  16.94  6.10e-10 

APL 

0 

— oo 

79  71.24  7.79e-7 

248  205.48  8.16e-7 

144  129.52  3.62e-9 

416  358.96  8.68e-9 

NERML 

0 

— oo 

153  128.71  7.30e-7 

300  257.54  1.18e-3 

300  251.79  4.03e-9 

800  717.13  9.24e-5 

NEST 

- 

10000  681.03  5.34e-5 

20000  1360.52  4.61e-6 

FAPL  method  for  large  dimension  matrix 

Matrix  A:m  x  n 

LB 

Iter. 

Time 

Acc. 

Iter. 

Time 

Acc. 

10000  X  20000 

0 

97 

36.65 

6.41e-ll 

185 

69.31 

7.29e-21 

L=5.0e7 

— oo 

207 

73.70 

8.28e-8 

800 

292.06 

2.32e-15 

10000  X  40000 

0 

67 

49.95 

9.21e-ll 

122 

91.49 

7.27e-21 

L=1.0e8 

— oo 

130 

88.40 

7.11e-8 

421 

295.15 

1.95e-16 

10000  X  60000 

0 

52 

58.06 

7.68e-ll 

95 

106.14 

8.43e-21 

L=1.5e8 

— oo 

156 

160.93 

9.84e-8 

394 

422.4 

7.48e-16 
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One  of  the  approaches  to  solve  this  problem  is  to  consider  the  associated  dual  or  primal-dual  formulations 
of  (5.3)  based  on  the  dual  formulation  of  the  TV  norm: 

||u||Ty  =  max  (p,  Du) ,  where  Y  =  {p=  ,pf^)  g  ^  ||p.||^  <  1, 1  <  i  <  N}.  (5.4) 

peY 

Consequently,  we  can  rewrite  (5.2)  as  a  saddle-point  problem: 

min  max  i II ^u  —  folio -I- A  (p,  Du)  .  (5.5) 

ugb”  peY  2 

Note  that  (5.5)  is  exactly  the  form  we  considered  in  the  USL  and  FUSL  method  if  we  let  g{y)  =  0.  Specifically, 
the  prox-function  v{y)  on  Y  is  simply  chosen  as  v{y)  =  ^||y|P  in  these  smoothing  techniques. 

In  our  experiments,  we  consider  two  types  of  instances  depending  on  how  the  matrix  A  is  generated. 
Specihcally,  for  the  first  case,  the  entries  of  A  are  normally  distributed,  while  for  the  second  one,  the  entries 
are  uniformly  distributed.  For  both  types  of  instances,  first,  we  generated  the  matrix  A  g  then  choose 

some  true  image  Xture  and  convert  it  to  a  vector,  and  finally  compute  fo  by  fo  =  Axtme  +  e,  where  e  is  the 
Gaussian  noise  with  distribution  e  =  N{0,a).  We  compare  the  following  algorithms:  the  accelerated  primal 
dual  (APD)  method  [5],  Nesterov’s  smoothing  (NEST-S)  method  [21,  1],  and  FUSL  method. 

For  our  first  experiment,  the  matrix  A  is  randomly  generated  of  size  4, 096  x  16,  384  with  entries  normally 
distributed  according  to  A^(0, -^/d,  096),  the  image  Xtme  is  a  128  x  128  Shepp-Logan  phantom  generated  by 
MATLAB.  Moreover,  we  set  A  =  10“^  and  the  standard  deviation  a  =  10“^.  The  Lipschitz  constants  are 


Table  5.2 

Gaussian  distributed  QP  instances 


A  :  n 

=  4000, 

m  =  3000,  L  =  2.32e4,  eo  = 

2.03e3 

Alg 

LB 

Iter. 

Time 

Acc. 

Iter. 

Time 

Acc. 

FAPL 

0 

105 

2.78 

8.43e-7 

153 

4.10 

7.84e-10 

—  CX) 

338 

8.02 

6.86e-7 

696 

16.58 

9.74e-10 

APT 

0 

128 

35.12 

9.01e-7 

172 

47.49 

9.28e-9 

— oo 

639 

200.67 

7.92e-7 

800 

258.25 

1.03e-7 

NERML 

0 

192 

48.44 

7.05e-7 

276 

70.31 

1.09e-8 

— oo 

300 

93.32 

3.68e-l 

800 

257.25 

6.41e-2 

NEST 

- 

10000 

211.30 

7.78e-4 

20000 

422.78 

1.95e-4 

A-.n  =  8000, 

m  =  4000,  L  =  2.32e4,  eo  = 

2.03e3 

Alg 

LB 

Iter. 

Time 

Acc. 

Iter. 

Time 

Acc. 

FAPL 

0 

49 

3.25 

8.34e-7 

68 

4.37 

7.88e-10 

—  OO 

165 

9.77 

5.17e-7 

280 

16.18 

5.06e-10 

APL 

0 

59 

48.91 

8.59e-7 

78 

64.95 

1.70e-8 

—  CX) 

300 

268.47 

9.81e-7 

670 

637.70 

9.42e-10 

NERML 

0 

105 

181.23 

9.14e-7 

133 

102.68 

1.39e-8 

—  CX) 

300 

282.56 

9.92e-3 

800 

760.26 

8.32e-4 

NEST 

- 

10000 

567.59 

3.88e-4 

20000 

1134.38 

9.71e-5 

FAPL  method  for  large  dimension  matrix 

Matrix  A:to  x  n 

LB 

Iter. 

Time 

Acc. 

Iter. 

Time 

Acc. 

10000  X  20000 

0 

78 

27.88 

7.22e-ll 

145 

51.81 

6.81e-21 

L=5.7e4 

—  CX) 

228 

78.57 

9.92e-8 

800 

280.19 

1.37e-15 

10000  X  40000 

0 

48 

34.36 

5.97e-ll 

87 

62.24 

8.26e-21 

L=9e4 

—  CX) 

156 

106.12 

7.18e-8 

390 

271.15 

4.29e-16 

10000  X  60000 

0 

34 

36.30 

9.88e-ll 

65 

69.56 

7.24e-21 

L=1.2e5 

—  CX) 

98 

98.11 

9.50e-8 

350 

361.83 

8.34e-16 
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Table  5.3 

Comparison  to  Matlab  solver 


Matlab  A\h 

FAPL  method 

Time 

Acc. 

Iter 

Time 

Acc. 

Uniform  2000  x  4000 

4.41 

5.48e-24 

204 

3.59 

6.76e-23 

Uniform  2000  x  6000 

7.12 

9.04e-24 

155 

4.10 

9.73e-23 

Uniform  2000  x  8000 

9.80 

9.46e-24 

135 

4.45 

9.36e-23 

Uniform  2000  x  10000 

12.43 

1.04e-23 

108 

4.23 

7.30e-23 

Gaussian  3000  x  5000 

11.17 

5.59e-25 

207 

6.25 

7.18e-23 

Gaussian  3000  x  6000 

13.96 

1.43e-24 

152 

5.50 

9.59e-23 

Gaussian  3000  x  8000 

19.57 

1.66e-24 

105 

4.83 

8.17e-23 

Gaussian  3000  x  10000 

25.18 

1.35e-24 

95 

5.43 

5.81e-23 

provided  for  APD  and  NEST-S,  and  the  initial  lower  bound  for  FUSE  method  is  set  to  0.  We  run  300 
iterations  for  all  these  algorithms,  and  report  the  objective  value  of  problem  (5.2)  and  the  relative  error 
defined  by  ||a;fc  —  a;true||2/||a;true||2  as  shown  in  Figure  5.1.  In  our  second  experiment,  the  matrix  A  is  randomly 
generated  with  entries  uniformly  distributed  in  [0, 1].  We  use  a  200  x  200  brain  image  [6]  as  the  true  image 
Xtrue,  and  set  m  =  20, 000,  A  =  10,  cr  =  10“^.  Other  setup  is  the  same  as  the  first  experiment,  and  the  results 
are  shown  in  Figure  5.2. 


True  APD  releti=6.0%  NEST-S  relBri=6.0%  FUSE  releri=l  .6% 


Fig.  5.1.  TV-based  reconstruction  (Shepp-Logan  phantom) 
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x10’ 


True  APD  teleri=24.9%  NEST-S  relerr=24.8%  FUSL  releri=4  6% 


Fig.  5.2.  TV-based  reconstruction  (brain  image) 


We  make  some  observations  about  the  results  in  Figures  5.1  and  5.2.  For  the  first  experiment,  there  is 
almost  no  difference  between  APD  and  NEST-S  method,  but  FUSL  outperforms  both  of  them  after  5  seconds 
in  terms  of  both  objective  value  and  relative  error.  The  second  experiment  clearly  demonstrates  the  advantage 
of  FUSL  for  solving  CP  problems  with  large  Lipschitz  constants.  The  Lipschitz  constant  of  matrix  A  in  this 
instance  is  about  2  x  10®,  much  larger  than  the  Lipschitz  constant  (about  5.9)  in  the  first  experiment.  FUSL 
still  converges  quickly  and  decreases  the  relative  error  to  0.05  in  less  than  100  iterations,  while  APD  and 
NEST-S  converge  very  slowly  and  more  than  1,000  steps  are  required  due  to  the  large  Lipschitz  constants. 
It  seems  that  FUSL  is  not  so  sensitive  to  the  Lipschitz  constants  as  the  other  two  methods.  This  feature  of 
FUSL  makes  it  more  efficient  for  solving  large-scale  CP  problems  which  often  have  big  Lipschitz  constants. 

In  summary,  for  the  TV-based  image  reconstruction  problem  (5.2),  FUSL  not  only  enjoys  the  completely 
parameter- free  property  (and  hence  no  need  to  estimate  the  Lipschitz  constant),  but  also  demonstrates  sig¬ 
nificant  advantages  for  its  speed  of  convergence  and  its  solution  quality  in  terms  of  relative  error,  especially 
for  large-scale  problems. 


5.3.  Partially  parallel  imaging.  In  this  subsection,  we  compare  the  performance  of  the  FUSL  method 
with  several  related  algorithms  in  reconstruction  of  magnetic  resonance  (MR)  images  from  partial  parallel 
imaging  (PPI),  to  further  confirm  the  observations  on  advantages  of  this  method.  The  detailed  background  and 
description  of  PPI  reconstruction  can  be  found  in  [6] .  This  image  reconstruction  problem  in  two  dimensional 
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cases  can  be  modeled  as 


k  N 

min  ^  \\M:FS,u  -  /,f  +  A  || 

j=l  i=l 

where  u  is  the  vector  form  of  a  two-dimensional  image  to  be  reconstructed,  k  is  the  number  of  MR  coils 
(consider  them  as  sensors)  in  the  parallel  imaging  system.  F  G  is  a  2D  discrete  Fourier  transform 

matrix,  Sj  G  (7"^”  is  the  sensitivity  map  of  the  j-th  sensor,  and  M  G  is  a  binary  mask  describes  the 

scanning  pattern.  Note  that  the  percentages  of  nonzero  elements  in  M  describes  the  compression  ration  of 
PPI  scan.  In  our  experiments,  the  sensitivity  map  {Sj}j^i  is  shown  in  Figure  5.3,  the  image  Xtme  is  of  size 
512  X  512  shown  in  Figures  5.4  and  5.5,  and  the  measurements  {fj}  are  generated  by 

/,  =  M{FS,Xtrue  +  eJ7V2  +  e7/7=2),  j  =  l,...,k,  (5.6) 

where  noise  with  entries  independently  distributed  according  to  N{0,a).  We  conduct  two 

experiments  on  this  data  set  with  different  acquisition  rates,  and  compare  the  FUSL  method  to  NEST- 
S  method,  and  the  accelerated  linearized  alternating  direction  of  multipliers  (AL-ADMM)  with  line-search 
method  [25]. 

For  both  experiments,  set  cr  =  3  x  10“^, A  =  10“^,  and  {fj}j^i  are  generated  by  (5.6).  In  the  first 
experiment,  we  use  Cartesian  mask  with  acquisition  rate  14%:  acquire  image  in  one  row  for  every  successive 
seven  rows,  while  for  the  second  one,  we  use  Cartesian  mask  with  acquisition  rate  10%:  acquire  image  in 
one  row  for  every  successive  ten  rows.  The  two  masks  are  shown  in  Figure  5.3.  The  results  of  the  first  and 
second  experiment  are  shown  in  Figures  5.4  and  5.5  respectively.  These  experiments  again  demonstrate  the 
advantages  of  the  FUSL  method  over  these  state-of-the-art  techniques  for  PPI  image  reconstruction. 


Fig.  5.3.  Sensitivity  map  and  Cartesian  mask 


6.  Concluding  remarks.  In  this  paper,  we  propose  two  new  bundle-level  type  methods,  the  FAPL  and 
FUSL  methods,  to  uniformly  solve  black-box  smooth,  nonsmooth,  and  weakly  smooth  CP  problems  and  a  class 
of  structured  nonsmooth  problems.  Our  methods  achieve  the  same  optimal  iteration  complexity  and  maintain 
all  the  nice  features  of  the  original  APL  and  USL  methods.  Meanwhile,  by  simplifying  the  subproblems 
involved  in  the  algorithms  and  solving  them  exactly,  the  FAPL  and  FUSL  methods  reduce  the  iteration  cost 
and  increase  the  accuracy  of  solutions  significantly,  and  thus  overcome  the  drawback  of  these  existing  bundle- 
level  type  methods  applied  to  large-scale  CP  problems.  Furthermore,  by  introducing  a  generic  algorithmic 
framework,  we  extend  the  uniformly  optimal  bundle-level  type  methods  to  unconstrained  problems  and  broaden 
the  applicability  of  these  algorithms.  The  complexity  of  bundle-level  type  methods  for  unconstrained  convex 
optimizations  has  been  analyzed  for  the  first  time  in  the  literature.  The  numerical  results  for  least  square 
problems  and  total  variation  based  image  reconstruction  clearly  demonstrate  the  advantages  of  FAPL  and 
FUSL  methods  over  the  original  APL,  USL  and  some  other  state-of-the-art  first-order  methods. 
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